0% found this document useful (0 votes)

72 views588 pages

Unit Root Tests in Time Series - Ebook

Uploaded by

Victor Ekpu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views588 pages

Unit Root Tests in Time Series - Ebook

Uploaded by

Victor Ekpu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unit Root Tests in Time Series Volume 2

Extensions and Developments

Kerry Patterson
ISBN: 9781137003317
DOI: 10.1057/9781137003317
Palgrave Macmillan

Please respect intellectual property rights

This material is copyright and its use is restricted by our standard site license terms
and conditions (see
http://www.palgraveconnect.com/pc/connect/info/terms_conditions.html). If you plan
to copy, distribute or share in any format including, for the avoidance of doubt, posting
on websites, you need the express prior permission of Palgrave Macmillan. To
request permission please contact [email protected].
Unit Root Tests in Time Series Volume 2

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE i — #1

PalgraveTexts in Econometrics
General Editor: Kerry Patterson

Titles include:
Simon P. Burke and John Hunter
MODELLING NON-STATIONARY TIME SERIES
Michael P. Clements
EVALUATING ECONOMETRIC FORECASTS OF ECONOMIC AND FINANCIAL VARIABLES
Lesley Godfrey

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
BOOTSTRAP TESTS FOR REGRESSION MODELS
Terence C. Mills
MODELLING TRENDS AND CYCLES IN ECONOMIC TIME SERIES
Kerry Patterson
A PRIMER FOR UNIT ROOT TESTING
Kerry Patterson
UNIT ROOT TESTS IN TIME SERIES VOLUME 1
Key Concepts and Problems
Kerry Patterson
UNIT ROOT TESTS IN TIME SERIES VOLUME 2
Extensions and Developments

Palgrave Texts in Econometrics

Series Standing Order ISBN 978- 1–4039–0172-9 hardback
978-1–4039–0173–6 paperback (outside North America only)
You can receive future titles in this series as they are published by placing a standing order. Please
contact your bookseller or, in case of difﬁculty, write to us at the address below with your name and
address, the title of the series and the ISBN quoted above.
Customer Services Department, Macmillan Distribution Ltd, Houndmills, Basingstoke, Hampshire
RG21 6XS, England

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE ii — #2

Unit Root Tests in Time Series
Volume 2
Extensions and Developments

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Kerry Patterson

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE iii — #3

© Kerry Patterson 2012
All rights reserved. No reproduction, copy or transmission of this
publication may be made without written permission.
No portion of this publication may be reproduced, copied or transmitted
save with written permission or in accordance with the provisions of the
Copyright, Designs and Patents Act 1988, or under the terms of any licence
permitting limited copying issued by the Copyright Licensing Agency,
Saffron House, 6–10 Kirby Street, London EC1N 8TS.
Any person who does any unauthorized act in relation to this publication

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
may be liable to criminal prosecution and civil claims for damages.
The author has asserted his right to be identiﬁed as the author of this
work in accordance with the Copyright, Designs and Patents Act 1988.
First published 2012 by
PALGRAVE MACMILLAN
Palgrave Macmillan in the UK is an imprint of Macmillan Publishers Limited,
registered in England, company number 785998, of Houndmills, Basingstoke,
Hampshire RG21 6XS.
Palgrave Macmillan in the US is a division of St Martin’s Press LLC,
175 Fifth Avenue, New York, NY 10010.
Palgrave Macmillan is the global academic imprint of the above companies
and has companies and representatives throughout the world.
Palgrave® and Macmillan® are registered trademarks in the United States,
the United Kingdom, Europe and other countries
ISBN: 978–0–230–25026–0 hardback
ISBN: 978–0–230–25027–7 paperback
This book is printed on paper suitable for recycling and made from fully
managed and sustained forest sources. Logging, pulping and manufacturing
processes are expected to conform to the environmental regulations of the
country of origin.
A catalogue record for this book is available from the British Library.
A catalog record for this book is available from the Library of Congress.
10 9 8 7 6 5 4 3 2 1
21 20 19 18 17 16 15 14 13 12
Printed and bound in Great Britain by
CPI Antony Rowe, Chippenham and Eastbourne

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE iv — #4

Contents

Detailed Contents vi

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
List of Tables xx

List of Figures xxiii

Symbols and Abbreviations xxvii

Preface xxix

1 Some Common Themes 1

2 Functional Form and Nonparametric Tests for a Unit Root 28

3 Fractional Integration 76

4 Semi-Parametric Estimation of the Long-Memory Parameter 154

5 Smooth Transition Nonlinear Models 240

6 Threshold Autoregressions 325

7 Structural Breaks in AR Models 381

8 Structural Breaks with Unknown Break Dates 436

9 Conditional Heteroscedasticity and Unit Root Tests 497

References 528

Author Index 542

Subject Index 546

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE v — #5

Detailed Contents

List of Tables xx

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
List of Figures xxiii

Symbols and Abbreviations xxvii

Preface xxix

1 Some Common Themes 1

Introduction 1
1.1 Stochastic processes 1
1.2 Stationarity and some of its implications 3
1.2.1 A strictly stationary process 4
1.2.2 Stationarity up to order m 4
1.3 ARMA(p, q) models 6
1.4 The long-run variance 7
1.5 The problem of a nuisance parameter only identiﬁed under
the alternative hypothesis 9
1.5.1 Structural stability 9
1.5.2 Self-exciting threshold autoregression (SETAR) 14
1.5.3 Bootstrap the test statistics 16
1.5.4 Illustration: Hog–corn price 18
1.5.4.i Chow test for temporal structural stability 19
1.5.4.ii Threshold autoregression 20
1.6 The way ahead 21
Questions 22

2 Functional Form and Nonparametric Tests for a Unit Root 28

Introduction 28
2.1 Functional form: linear versus logs 30
2.1.1 A random walk and an exponential random walk 30
2.1.2 Simulation results 32
2.2 A parametric test to discriminate between levels and logs 34
2.2.1 Linear and log-linear integrated processes 34
2.2.2 Test statistics 35
2.2.2.i The KM test statistics 35
2.2.2.ii Critical values 37
2.2.2.iii Motivation 39

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE vi — #6

Contents vii

2.2.2.iv Power considerations 40

2.2.3 Illustrations 40
2.2.3.i Ratio of gold to silver prices 41
2.2.3.ii World oil production 41
2.3 Monotonic transformations and unit root tests based
on ranks 43
2.3.1 DF rank-based test 45
2.3.2 Rank-score test, Breitung and Gouriéroux (1997) 47

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2.3.2.i Serially correlated errors 49
2.3.2.ii Simulation results 51
2.4 The range unit root test 51
2.4.1 The range and new ‘records’ 52
2.4.1.i The forward range unit root test 53
2.4.1.ii Robustness of RUR F 54
2.4.2 The forward–backward range unit root test 55
2.4.3 Robustness of RUR F and R FB 56
UR
2.4.4 The range unit root tests for trending alternatives 59
2.5 Variance ratio tests 60
2.5.1 A basic variance ratio test 61
2.5.2 Breitung variance ratio test 64
2.6 Comparison and illustrations of the tests 66
2.6.1 Comparison of size and power 67
2.6.2 Linear or exponential random walks? 69
2.6.3 Empirical illustrations 70
2.6.3.i Ratio of gold–silver price (revisited) 70
2.6.3.ii Air revenue passenger miles (US) 71
2.7 Concluding remarks 73
Questions 74

3 Fractional Integration 76
Introduction 76
3.1 A fractionally integrated process 78
3.1.1 A unit root process with fractionally integrated noise 78
3.1.2 Binomial expansion of (1 − L)d 79
3.1.2.i AR coefficients 79
3.1.2.ii MA coefficients 80
3.1.2.iii The fractionally integrated model in terms
of the Gamma function 80
3.1.3 Two definitions of an I(d) process, d fractional 82
3.1.3.i Partial summation 82
3.1.3.ii Direct definition 83
3.1.4 The difference between type I and type II processes 84

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE vii — #7

viii Contents

3.1.5 Type I and type II fBM 84

3.1.6 Deterministic components and starting value(s) 86
3.1.6.i Deterministic components 86
3.1.6.ii Starting value(s) 86
3.2 The ARFIMA(p, d, q) model 87
3.2.1 Autocovariances and autocorrelations of the
ARFIMA(0, d, 0) process, d ≤ 0.5 88
3.2.1.i Autocovariances 88

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.2.1.ii Autocorrelations 88
3.2.1.iii Inverse autocorrelations 89
3.2.2 Graphical properties of some simple ARFIMA
models 89
3.3 What kind of models generate fractional d? 93
3.3.1 The error duration model (Parke, 1999) 93
3.3.1.i The model 93
3.3.1.ii Motivation: the survival rate of firms 94
3.3.1.iii Autocovariances and survival probabilities 94
3.3.1.iv The survival probabilities in a long-memory
process 96
3.3.1.v The survival probabilities in a short-memory,
AR(1), process 97
3.3.2 An example: the survival rate for US firms 97
3.3.3 Error duration and micropulses 99
3.3.4 Aggregation 99
3.3.4.i Aggregation of ‘micro’ relationships 100
3.3.4.ii The AR(1) coefficients are ‘draws’ from the
beta distribution 101
3.3.4.iii Qualifications 101
3.4 Dickey-Fuller tests when the process is fractionally
integrated 102
3.5 A fractional Dickey-Fuller (FDF) test for unit roots 105
3.5.1 FDF test for fractional integration 106
3.5.2 A feasible FDF test 107
3.5.3 Limiting null distributions 107
3.5.4 Serially correlated errors: an augmented FDF, AFDF 108
3.5.5 An efficient FDF test 110
3.5.6 EFDF: limiting null distribution 112
3.5.7 Serially correlated errors: an augmented EFDF, AEFDF 113
3.5.8 Limiting null distribution of tη (d̂T ) 114
3.6 FDF and EFDF tests: deterministic components 114
3.7 Locally best invariant (LBI) tests 118
3.7.1 An LM-type test 118

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE viii — #8

Contents ix

3.7.1.i No short-run dynamics 118

3.7.1.ii Deterministic components 121
3.7.1.iii Short-run dynamics 122
3.7.2 LM test: a regression approach 124
3.7.2.i No short-run dynamics 124
3.7.2.ii Short-run dynamics 125
3.7.2.iii Deterministic terms 126
3.7.3 A Wald test based on ML estimation 126

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.8 Power 127
3.8.1 Power against ﬁxed alternatives 127
3.8.2 Power against local alternatives 128
3.8.3 The optimal (power-maximising) choice of d1
in the FDF test(s) 129
3.8.4 Illustrative simulation results 131
3.9 Example: US wheat production 135
3.10 Concluding remarks 141
Questions 142
Appendix 3.1 Factorial expansions for integer and non-integer d 149
A3.1 What is the meaning of (1 − L)d for fractional d? 149
A3.2 Applying the binomial expansion to the fractional difference
operator 150
A3.3 The MA coefﬁcients in terms of the gamma function 151
Appendix 3.2 FDF test: assuming known d1 152

4 Semi-Parametric Estimation of the Long-memory Parameter 154

Introduction 154
4.1 Linear ﬁlters 156
4.1.1 A general result 156
4.1.2 The spectral density of an ARMA process 157
4.1.3 Examples of spectral densities 158
4.1.4 The difference ﬁlter 159
4.1.5 Processes with fractional long-memory parameter 160
4.1.5.i ARFIMA processes 160
4.1.5.ii Fractional noise (FN) processes 161
4.2 Estimating d 162
4.2.1 Prior transformation of the data 162
4.2.2 The discrete Fourier transform, the periodogram and
the log-periodogram ‘regression’, pooling and tapering 163
4.2.2.i Reminders 163
4.2.2.ii Deterministic components 163
4.3 Estimation methods 164
4.3.1 A log-periodogram estimator (GPH) 164

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE ix — #9

x Contents

4.3.1.i Setting up the log-periodogram (LP)

regression framework 164
4.3.1.ii A simple LS estimator based on the
log-periodogram 165
4.3.1.iii Properties of the GPH estimator(s) 166
4.3.1.iv An (approximately) mean squared error
optimal choice for m 168
4.3.1.v Extensions to nonstationary time series 170

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
4.3.1.vi Testing the unit root hypothesis using the
GPH estimator 171
4.4 Pooling and tapering 171
4.4.1 Pooling adjacent frequencies 172
4.4.2 Tapering 172
4.5 Variants of GPH estimation using pooling and tapering 174
4.5.1 Pooled and pooled and tapered GPH 174
4.5.2 A bias-adjusted log-periodogram estimator 175
4.5.2.i The AG bias-reduced estimator 175
4.5.2.i.a The estimator 175
4.5.2.i.b Asymptotic properties of the AG
estimator 178
4.5.2.i.c Bias, rmse and mopt comparisons;
comparisons of asymptotics 179
4.6 Finite sample considerations and feasible estimation 183
4.6.1 Finite sample considerations 185
4.6.2 Feasible estimation (iteration and ‘plug-in’ methods) 186
4.7 A modified LP estimator 188
4.7.1 The modified discrete Fourier transform and
log-periodogram 189
4.7.2 The modified DFT and modified log-periodogram 190
4.7.3 The modified estimator 191
4.7.4 Properties of the modified estimator 191
4.7.4.i Consistency 191
4.7.4.ii Asymptotic normality for d ∈ [0.5, 2] if the
initial condition is zero or known 192
4.7.4.iii Reported simulation results 192
4.7.5 Additional simulations 193
4.7.6 Testing the unit root hypothesis 197
4.8 The approximate Whittle contrast and the Gaussian
semi-parametric estimator 200
4.8.1 A discrete contrast function (Whittle’s contrast function) 200
4.8.2 The Whittle contrast as a discrete approximation
to the likelihood function 201

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE x — #10

Contents xi

4.8.3 Estimation details 202

4.8.3.i The estimation set-up 202
4.8.3.ii Concentrating out G 203
4.8.3.iii Estimating d 204
4.8.4 Properties of d̂LW and mean squared error optimal m
for the LW estimator 205
4.8.5 An (approximately) mean squared error optimal
choice for m 205

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
4.8.6 The presence of a trend 206
4.8.7 Consistency and the asymptotic distribution of d̂LW 207
4.8.8 More on the LW estimator for d in the nonstationary
range 207
4.9 Modiﬁcations and extensions of the local Whittle estimator 209
4.9.1 The modiﬁed local Whittle estimator 209
4.9.2 The ‘exact’ local Whittle estimator 210
4.9.2.i A problem 210
4.9.2.ii The exact LW estimator, with known initial
value: d̂ELW 211
4.9.2.iii Properties of the exact LWE 213
4.9.2.iv The exact LW estimator with unknown
initial value: d̂FELW,μ 214
4.9.2.v Properties of d̂FELW,μ 215
4.9.2.vi Allowing for a deterministic polynomial
trend 215
4.10 Some simulation results for LW estimators 216
4.11 Broadband – or global – estimation methods 221
4.12 Illustrations 223
4.12.1 The US three-month T-bill 224
4.12.2 The price of gold in US$ 228
4.13 Concluding remarks 231
Questions 232
Appendix: The Taylor series expansion of a logarithmic function 238

5 Smooth Transition Nonlinear Models 240

Introduction 240
5.1 Smooth transition autoregressions 242
5.2 The exponential smooth transition function: ESTAR 242
5.2.1 Speciﬁcation of the transition function 242
5.2.2 An ‘inner’ and an ‘outer’ regime 243
5.2.3 An asymmetric ESTAR, AESTAR 244
5.2.4 The ESTAR as a variable coefﬁcient model 246
5.2.5 An ESTAR(2) model 249

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xi — #11

xii Contents

5.2.6 The ESTAR(p) model 251

5.2.7 The ESTAR speciﬁcation including a constant 252
5.2.8 Trended data 252
5.2.9 Standardisation of the transition variable in the
ESTAR model 253
5.2.10 Root(s) and ergodicity of the ESTAR model 253
5.2.10.i ESTAR(1) 253
5.2.10.ii The conditional root in the unit root plus

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
white noise case 254
5.2.10.iii ESTAR(2) 254
5.2.11 Existence and stability of stationary points 255
5.2.11.i Equilibrium of the STAR process 255
5.2.11.ii Stability of singular points and limit cycles 256
5.2.11.iii Some special cases of interest 258
5.3 LSTAR transition function 259
5.3.1 Standardisation of the transition variable in the
LSTAR model 260
5.3.2 The LSTAR model 261
5.4 Further developments of STAR models: multi-regime models 264
5.5 Bounded random walk (BRW) 265
5.5.1 The basic model 265
5.5.2 The bounded random walk interval 266
5.5.3 Simulation of the BRW 269
5.6 Testing for nonstationarity when the alternative is a
nonlinear stationary process 269
5.6.1 The alternative is an ESTAR model 269
5.6.1.i The KSS test for a unit root 269
5.6.1.ii A joint test 271
5.6.1.iii An Inf-t test 271
5.6.1.iv Size and power of unit root tests against
ESTAR nonlinearity 274
5.6.1.v Size 274
5.6.1.vi Power 274
5.6.2 Testing for nonstationarity when the alternative is a
nonlinear BRW 281
5.7 Misspeciﬁcation tests to detect nonlinearity 283
5.7.1 Teräsvirta’s tests 283
5.7.2 The Escribano-Jordá variation 286
5.8 An alternative test for nonlinearity: a conditional mean
encompassing test 288
5.8.1 Linear, nonnested models 288
5.8.2 Nonlinear, nonnested models: ESTAR and LSTAR 290

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xii — #12

Contents xiii

5.8.2.i Linearisation 290

5.8.2.ii Choosing between models in the STAR class 292
5.8.2.iiiIllustration: ESTAR(1) or LSTAR(1) model for
the US:UK real exchange rate 297
5.9 Testing for serial correlation in STAR models 299
5.9.1 An LM test for serial correlation in the STAR model 299
5.10 Estimation of ESTAR models 301
5.10.1 A grid search 301

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
5.10.2 Direct nonlinear estimation 302
5.10.3 Simulation set-up 303
5.10.3.i Setting starting values for direct estimation 303
5.10.3.ii Grid search 303
5.10.3.iii The error variance and admissibility rules 304
5.10.3.iv The set-up and simulation results 304
5.11 Estimation of the BRW 308
5.11.1 Simulation set-up 309
5.11.2 Simulation results 310
5.12 Concluding remarks 314
Questions 316

6 Threshold Autoregressions 325

Introduction 325
6.1 Multiple regime TARs 326
6.1.1 Self-exciting TARs 326
6.1.1.i Two-regime TAR(1) 327
6.1.1.ii Stationarity of the 2RTAR 330
6.1.2 Generalisation of the TAR(1) and MTAR(1) models 330
6.1.2.i EG augmentation 331
6.1.2.ii 2RTAR(p) and 2RMTAR(p) 331
6.1.2.iii Multiple regimes 334
6.1.2.iv Three-regime TAR: 3RTAR 335
6.1.2.v TAR and LSTAR 338
6.2 Hypothesis testing and estimation: general issues 338
6.2.1 Threshold parameter is not identiﬁed under the null 338
6.2.1.i Hypothesis testing 338
6.2.1.ii Estimation with κ unknown 339
6.2.1.iii Asymptotic distribution of the test statistic 339
6.3 Estimation of TAR and MTAR models 340
6.3.1 Known regime separation 340
6.3.2 Unknown regime separation 340
6.3.2.i The sample mean is not a consistent
estimator of the threshold parameter 340

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xiii — #13

xiv Contents

6.3.2.iiThreshold parameters and delay parameter

consistently estimated by conditional least
squares (stationary two-regime TAR
and MTAR) 341
6.3.2.iii Estimation of a TAR or MTAR with three
regimes 342
6.3.2.iv Unknown delay parameter, d 342
6.3.2.v Partial unit root cases 343

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
6.3.3 Regime-dependent heteroscedasticity 343
6.4 Testing for nonstationarity in TAR and MTAR models 344
6.4.1 Enders-Granger model, 2RTAR and 2RMTAR 344
6.4.1.i Known threshold 345
6.4.1.ii Unknown threshold 348
6.4.2 Testing for nonstationarity in a 3RTAR 348
6.4.2.i Testing for two unit roots conditional on a
unit root in the inner regime 348
6.4.2.ii Testing for three unit roots 351
6.5 Test for a threshold effect in a stationary TAR model 352
6.5.1 A stationary 2RTAR 353
6.5.2 Test for a threshold effect in a stationary multi-regime
TAR, with unknown regime separation 355
6.5.3 A test statistic for a null hypothesis on κ 356
6.6 Tests for threshold effects when stationarity is not
assumed (MTAR) 356
6.6.1 Known regime separation 357
6.6.2 Unknown regime separation 357
6.7 Testing for a unit root when it is not known whether
there is a threshold effect 360
6.8 An empirical study using different nonlinear models 361
6.8.1 An ESTAR model for the dollar–sterling exchange rate 361
6.8.1.i Initial tests 361
6.8.1.ii Estimation 363
6.8.2 A bounded random walk model for the dollar–sterling
exchange rate 367
6.8.2.i Estimation results 367
6.8.2.ii An estimated random walk interval (RWI) 368
6.8.3 Two- and three-regime TARs for the exchange rate 369
6.8.4 A threshold effect? 372
6.8.5 The unit root null hypothesis 373
6.9 Concluding remarks 374
Questions 376

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xiv — #14

Contents xv

7 Structural Breaks in AR Models 381

Introduction 381
7.1 Basic structural break models 383
7.1.1 Reminder 383
7.2 Single structural break 385
7.2.1 Additive outlier (AO) 386
7.2.1.i AO model speciﬁcation 386
7.2.1.ii Estimation 389

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
7.2.2 Innovation outlier (IO) 389
7.2.2.i IO Model 1 389
7.2.2.ii IO Model 2 391
7.2.2.iii IO Model 3 392
7.3 Development: higher-order dynamics 393
7.3.1 Stationary processes (the alternative hypothesis) 393
7.3.1.i Model 1 set-up 393
7.3.1.ii Model 2 set-up 394
7.3.1.iii Model 3 set-up 395
7.3.2 Higher-order dynamics: AO models 395
7.3.2.i Summary of models: AO case 396
7.3.3 Higher-order dynamics: IO models 397
7.3.3.i Summary of models: IO case 399
7.3.4 Higher-order dynamics: distributed outlier
(DO) models 400
7.3.4.i Model 1 400
7.3.4.ii Model 2 402
7.3.4.iii Model 3 403
7.3.4.iv Summary of models: DO case 403
7.4 AO or IO? 404
7.4.1 DO to AO 404
7.4.2 DO to IO 405
7.4.3 Distributions of test statistics 405
7.4.4 Model selection by log likelihood/information criteria 408
7.5 Example AO(1), IO(1) and DO(1) 410
7.5.1 Estimation 410
7.5.2 Bootstrapping the critical values 411
7.6 The null hypothesis of a unit root 412
7.6.1 A break under the null hypothesis 412
7.6.2 AO with unit root 412
7.6.2.i AO Model 1 413
7.6.2.ii AO Model 2 413
7.6.2.iii AO Model 3 413
7.6.3 IO with unit root 413

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xv — #15

xvi Contents

7.6.3.i IO Model 1 413

7.6.3.ii IO Model 2 414
7.6.3.iii IO Model 3 414
7.6.4 DO with unit root 414
7.6.4.i DO Model 1 414
7.6.4.ii DO Model 2 415
7.6.4.iii DO Model 3 416
7.7 Testing framework for a unit root 416

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
7.7.1 The AO models 416
7.7.2 IO Models 417
7.7.2.i IO Model 1 417
7.7.2.ii IO Model 2 418
7.7.2.iii IO Model 3 419
7.7.3 DO Models 420
7.7.3.i DO Model 1 420
7.7.3.ii DO Model 2 420
7.7.3.iii DO Model 3 421
7.8 Critical values 422
7.8.1 Exogenous versus endogenous break 422
7.8.2 Critical values 422
7.9 Implications of a structural break for DF tests 423
7.9.1 Spurious non-rejection of the null hypothesis of a
unit root 423
7.9.2 The set-up 423
7.9.3 Key results 424
7.9.3.i Data generated by Model 1 425
7.9.3.ii Data generated by Model 1a 426
7.9.3.iii Data generated by Model 2 427
7.9.3.iv Data generated by Model 3 428
7.10 Concluding remarks 429
Questions 431

8 Structural Breaks with Unknown Break Dates 436

Introduction 436
8.1 A break in the deterministic trend function 438
8.1.1 Testing for a break in the deterministic trend function:
Sup-W(λb ) 438
8.1.1.i Example 439
8.1.1.ii Unknown break date 440
8.1.1.iii Conservative critical values 441
8.1.1.iv A double break suggests a loss of power 442

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xvi — #16

Contents xvii

8.1.2 Rejection of the null hypothesis – partial parameter

stability: LR(λ̂b )i 442
8.1.2.i The Hsu and Kuan (2001) procedure 443
8.1.2.ii The two-step procedure 444
8.1.3 Including the unit root in the joint null: Sup-W(λb , γ) 445
8.2 Break date selection for unit root tests 446
8.2.1 Criteria to select the break date 446
(i)
8.2.1.i Minimising the psuedo-t statistic τ̃γ (λb ) 447

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
8.2.1.ii The decision rule 448
8.2.1.iii Critical values 448
8.2.2 Signﬁcance of the coefﬁcient(s) on the dummy
variable(s) 448
8.2.2.i The ‘VP criteria’ 448
8.2.2.ii Critical values 450
8.2.3 A break under the null hypothesis 451
8.2.3.i Invariance of standard test statistics 451
8.3 Further developments on break date selection 453
8.3.1 Lee and Strazicich (2001): allowing a break under
the null 454
8.3.1.i Simulation results 454
8.3.1.ii Use BIC to select the break date? 456
8.3.2 Harvey, Leybourne and Newbold (2001) 457
8.3.2.i The break date is incorrectly chosen 457
8.3.3 Selection of break model under the alternative 457
8.3.3.i Misspecifying the break model: what are the
consequences? 458
8.3.3.ii No break under the null 459
8.3.3.iii Break under the null 460
8.3.3.iv Simulation results 460
8.3.3.v Implications of choosing the wrong model 463
8.4 What kind of break characterises the data? 463
8.4.1 Methods of selecting the type of break 464
8.5 Multiple breaks 469
8.5.1 AO Models 470
8.5.2 IO Models 472
8.5.3 Grid search over possible break dates 473
8.5.3.i Selecting the break dates 473
8.5.3.ii Test statistics 473
8.6 Illustration: US GNP 474
8.6.1 A structural break? 475
8.6.2 Which break model? 478
8.6.3 The break date? 479

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xvii — #17

xviii Contents

8.6.3.i
Break dates suggested by different criteria 479
8.6.3.ii
AO Model 2 and IO Models 1 and 2:
estimation results with BIC date of break 480
8.6.3.iii The estimated long-run trend functions 480
8.6.3.iii.a AO Model 2 482
8.6.3.iii.b IO Model 1 482
8.6.3.iii.c IO Model 2 482
8.6.4 Inference on the unit root null hypothesis 484

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
8.6.4.i AO Model 2 484
8.6.4.ii IO Models 484
8.6.4.ii.a IO Model 1 484
8.6.4.ii.b IO Model 2 487
8.6.4.ii.c A break under null 487
8.6.5 Overall 488
8.6.6 Two breaks 489
8.6.6.i IO(2, 3) model 489
8.7 Concluding remarks 492
Questions 493

9 Conditional Heteroscedasticity and Unit Root Tests 497

Introduction 497
9.1 Stochastic unit roots 498
9.1.1 The basic framework 498
9.1.2 Tests designed for a StUR 501
9.1.2.i Background 501
9.1.2.ii The test statistics 503
9.1.2.iii Some examples 505
9.2 DF tests in the presence of GARCH errors 506
9.2.1 Conditional and unconditional variance 506
9.2.2 Illustrative AR(1) + GARCH(1, 1) series 507
9.2.2.i Asymptotic distributions of DF tests with
GARCH errors 507
9.2.2.ii Illustration with GARCH(1, 1) 509
9.2.3 Tests that may improve size retention and power
with GARCH errors 511
9.2.3.i Using heteroscedasticity consistent standard
errors 511
9.2.3.ii A recursive mean/trend approach 513
9.2.4 Simulation results 515
9.2.4.i Size 516
9.2.4.ii Power 516
9.2.4.iii Other methods (ML and GLS) 517

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xviii — #18

Contents xix

9.2.4.iii.a Maximum likelihood 517

9.2.4.iii.b GLS 521
9.2.5 An ADF(p)-GARCH(1, 1) example 523
9.3 Concluding remarks 524
Questions 525

References 528

Author Index 542

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Subject Index 546

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xix — #19

List of Tables

2.1 Levels or logs: incorrect speciﬁcation of the maintained

regression, impact on rejection rates of DF test statistics (at a

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
5% nominal level) 33
2.2 Critical values for KM test statistics 38
2.2a Critical values for V1 and V2 38
2.2b Critical values for U1 and U2 38
2.3 Sensitivity of critical values for V1 , V2 ; T = 100, ψ1 = 0.0, ψ1 =
0.8 39
2.4 Power of KM nonnested tests for 5% nominal size 40
2.5 Regression details: gold–silver price ratio 42
2.6 Regression details: world oil production 44
2.7 Size of δ̂r,μ and τ̂r,μ using quantiles for δ̂μ and τ̂μ 46
2.8 Critical values of the rank-based DF tests δ̂r,μ , τ̂r,μ , δ̂r,β and τ̂r,β 47
2.9 Quantiles for the BG rank-score tests 50
2.10 Critical values of the forward range unit root test, RUR F (α, T) 54
2.11 Critical values of the empirical distribution of RUR (α, T)
FB 56
2.12 Actual and nominal size of DF and range unit root tests when
εt ∼ t(3) and niid(0, 1) quantiles are used 57
2.13 Critical values of νrat,j 66
2.14 Summary of size and power for rank, range and variance
ratio tests 68
2.15 Levels or logs, additional tests: score and variance ratio tests
(empirical size) 70
2.16 Tests for a unit root: ratio of gold to silver prices 71
2.17 Regression details: air passenger miles, US 72
2.18 Tests for a unit root: air passenger miles 72
2.19 KM tests for linearity or log-linearity: air passenger miles 73
3.1 Empirical survival rates Sk and conditional survival rates
Sk /Sk−1 for US ﬁrms 97
3.2 Power of ADF τ̂μ test for fractionally integrated series 105
3.3 Optimal d1 , d1∗ , for use in PAFDF t̂γ (d1 ) tests (for local
alternatives) 131
3.4a Power of various tests for fractional d, (demeaned data), T = 200 133
3.4b Power of various tests for fractional d, (demeaned data), T = 500 133
3.5a Power of various tests for fractional d, (detrended data), T = 200 136
3.5b Power of various tests for fractional d, (detrended data), T = 1,000 136
3.6 Estimation of models for US wheat production 139
3.7 Tests with a fractionally integrated alternative; p = 0 and p = 2
for augmented tests 139

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xx — #20

List of Tables xxi

A3.1 Convergence and asymptotic distribution of LS estimator γ̂ (d1

known) 152
A3.2 Asymptotic distribution of LS t statistic t̂γ (d1 ), d1 known 153
opt
4.1 ARFIMA(1, d, 0): theoretical mAG for d̂AG (r) 183
4.2 ARFIMA(1, d, 0): theoretical rmse for d̂AG (r) 184
equal equal
4.3 T(bias)0=1 and T(mse)0=1 for fu (λj ) = ARMA(1, 0) process 185
4.4 Asymptotic results for d̂AG (r) and simulation results for the

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
plug-in version of the AG estimator, d̂PAG (r); d = 0, T = 512 188
4.5 Simulation results for d̂GPH and d̂MGPH with AR(1) short-run
dynamics, φ1 = 0.3, T = 512 193
4.6 Properties of d̂LW for stationary and nonstationary cases 208
4.7 LW estimators considered in the simulations 216
4.8 Semi-parametric estimators considered in this section 223
4.9 Estimates of d for the US three-month T-bill rate 225
4.10 Estimates of d for the price of gold 229
opt opt
A4.1 mGPH and RMSE at mGPH for d̂AG (r), r = 0, 1 234
A4.2 Ratio of asymptotic bias for d̂AG (0) and d̂AG (1),
φ1 = 0.9 235
5.1 Critical values of the KSS test for nonlinearity tNL 271
5.2 Critical values of the Inf-t test for nonlinearity 274
5.3 Empirical size of unit root tests for ESTAR nonlinearity: 5%
nominal size 275
5.4 Power of some unit root tests against an ESTAR alternative 275
5.5 Power of τ̂μ and τ̂ws
μ against the BRW alternative for different
values of α(i) and σε2 282
5.6 Nonlinearity testing: summary of null and alternative hypotheses 286
5.7 Summary of illustrative tests for nonlinearity 288
5.8 Estimation of the BRW model 311
5.9 Quantiles of the LR test statistic and quantiles of χ2 (1) 314
6.1 Integers for the ergodicity condition in a TAR model 330
6.2 Quantiles for testing for a unit root in a 2RTAR model and in a
2RMTAR model 346
6.3 Simulated quantiles of W(0, 0) and exp[W(0, 0)/2] 351
6.4 Unit root test statistics for the US: UK real exchange rate 362
6.5a Tests for nonlinearity (delay parameter = 1) 363
6.5b Tests for nonlinearity (delay parameter = 2) 363
6.6 Estimation of ESTAR models for the dollar–sterling real
exchange rate 365
6.7 Estimation of a BRW model for the dollar–sterling real
exchange rate 368
6.8 Estimation of the (linear) ADF(2) model for the dollar–sterling
real exchange rate 370

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxi — #21

xxii List of Tables

6.9 Estimation of the two regime TAR(3) model for the

dollar–sterling real exchange rate 371
6.10 Estimation of the three-regime TAR(3) model for the
dollar–sterling real exchange rate 371
6.11 Test statistics for threshold effects and boostrapped p-values in
the three-regime and two-regime TAR models 372
6.12 Simulated quantiles for unit root tests in 3RTAR 374
A6.1 ESTAR(3) models for the dollar–sterling real exchange rate 378

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
7.1 Asymptotic critical values for AO and IO models 1, 2 and 3,
with λb = λcb 423
8.1 Asymptotic 95% quantile for Wald test statistics for a break in
the trend 445
8.2 Model 2 asymptotic and ﬁnite sample 5% critical values for
Sup-W(λb , γ) for the joint null hypothesis of a break in the
trend function and unit root 446
(i)
8.3 Asymptotic critical values for τ̃γ (λb ), AO and IO models 1, 2
and 3, with λb selected by different criteria 449
(i)
8.4 5% critical values for the τ̃γ (λb ) test, i = Model 1, Model 1a;

break date = Tb = T̃b + 1 458
8.5 Summary of Wald-type diagnostic break tests for one break 478
8.6 BIC for the AO and IO versions of Models 1, 2, and 3 479
8.7 Break dates suggested by various criteria in Models 1, 2 and 3 480
8.8 Estimation of AO and IO Models (single-break model) 482
8.9 Estimation of two breaks, IO (2, 3) Model 483
8.10 Bootstrap critical values for two-break IO (2, 3) model 492
9.1 Quantiles of test statistics: Z1 , Z2 , E1 and E2 505
9.2 Size of τ̂-type tests in the presence of GARCH(1, 1) errors from
different distributions; T = 200, nominal 5% size 510
9.3 Power of τ̂-type tests in the presence of GARCH(1, 1) errors;
ρ = 0.95, 0.9, T = 200 and ut ∼ N(0, 1), nominal 5% size 511
9.4 Critical values for τ̂rma
μ and τ̂rma
β 516
9.5 Size of various τ̂-type tests in the presence of GARCH(1, 1)
errors; T = 200, nominal 5% size 517
9.6 Power of τ̂-type tests in the presence of GARCH (1, 1) errors;
ρ = 0.95, 0.9, T = 200 and ut ∼ N(0, 1), nominal 5% size 518
9.7 Estimation details: ADF(3) for the US savings ratio, with and
without GARCH(1, 1) errors 524

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxii — #22

List of Figures

1.1 Illustrative fat-tailed random walks 4

1.2 Quantile plot against F distribution: λb known 12
95% quantile for Chow test: λb known

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
1.3 13
1.4 CDFs of F, Sup-C and Cave tests 14
1.5 CDFs of F, Sup-C and Cave tests for TAR 17
1.6 Hog–corn price ratio (log, monthly data) 19
2.1 Gold–silver price ratio 42
2.2 World oil production 43
2.3 QQ plots of DF τ̂ tests, niid v. t(3) 57
2.4 QQ plots of range unit-root tests, niid v. t(3) 58
2.5 US air passenger miles 71
3.1 Autocorrelations for FI(d) processes 90
3.2 MA coefficients for FI(d) processes 90
3.3 AR coefficients for FI(d) processes 91
3.4 Simulated data for fractional d 92
3.5 Simulated data for fractional d and serial correlation 92
3.6 Survival and conditional survival probabilities 96
3.7 Survival rates of US firms: actual and fitted 98
3.8 Power of DF tests against an FI(d) process 103
3.9 The drift functions for tγ , tη and LM0 130
3.10a Size-adjusted power (data demeaned) 134
3.10b Size-adjusted power (data demeaned), near unit root 134
3.11a Size-adjusted power (data detrended) 137
3.11b Size-adjusted power (data detrended), near unit root 137
3.12a US wheat production (logs) 138
3.12b US wheat production (logs, detrended) 138
4.1 The spectral density of yt for an AR(1) process 158
4.2 Optimal m for GPH estimator 170
4.3a Asymptotic bias, d̂(AG) (r), T = 500 180
4.3b Asymptotic bias, d̂(AG) (r), T = 2,500 180
4.4a Asymptotic rmse, d̂(AG) (r), T = 500 181
4.4b Asymptotic rmse, d̂(AG) (r), T = 2,500 181
4.5 Bias of selected GPH and MGPH variants, zero initial value 195
4.6 Mse of selected GPH and MGPH variants, zero initial value 195
4.7 Bias of selected GPH and MGPH variants, non-zero initial value 196
4.8 Mse of selected GPH and MGPH variants, non-zero initial value 196
4.9 Bias of selected GPH and MGPH variants, trended data, β1 = 1 197

xxiii

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxiii — #23

xxiv List of Figures

4.10 Mse of selected GPH and MGPH variants, trended data, β1 = 1 198
4.11a Bias of selected GPH and MGPH variants, trended data, β1 = 10 199
4.11b Mse of selected GPH and MGPH variants, trended data, β1 = 10 199
4.12 Mean of best of LW estimators, zero initial value 217
4.13 Mse of best of LW estimators, zero initial value 218
4.14 Mean of best of LW estimators, non-zero initial value 219
4.15 Mse of best of LW estimators, non-zero initial value 220
4.16 Mean of best of LW estimators, trended data 220

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
4.17 Mse of best of LW estimators, trended data 221
4.18 US T-bill rate 224
4.19a Estimates of d as bandwidth varies, US T-bill rate 227
4.19b d̂ELW,μ as bandwidth varies, US T-bill rate 227
4.20 Gold price US$ (end month) 1974m1 to 2004m12 228
4.21a Estimates of d as bandwidth varies, price of gold 230
4.21b d̂ELW,μ as bandwidth varies, price of gold 230
5.1 ESTAR transition function 244
5.2a ESTAR and random walk, θ = 0.02 245
5.2b ESTAR transition function, θ = 0.02 245
5.3a ESTAR and random walk, θ = 2.0 246
5.3b ESTAR transition function, θ = 2.0 246
5.4a AESTAR and random walk, θ+ = 0.02, θ− = 2.0 247
5.4b AESTAR transition function, θ+ = 0.02, θ− = 2.0 247
5.5a AESTAR and random walk, γ1 + = −1.0, γ1 − = −0.1 248
5.5b AESTAR transition function, γ1 + = −1.0, γ1 − = −0.1 248
5.6a ESTAR variable ar coefficient, βt , θ = 0.02, γ1 = −1.0 249
5.6b ESTAR variable ar coefficient, βt , θ = 2.0, γ1 = −1.0 249
5.7a ESTAR variable ar coefficient, βt , θ+ = 0.02, θ− = 2.0, γ1 = −1.0 250
5.7b ESTAR variable ar coefficient, βt , γ1+ = −1.0, γ1− = −0.1, θ = 0.02 250
5.8 ESTAR(2) simulation, multiple equilibria 257
5.9 ESTAR(2) simulation, stable limit cycle 258
5.10 LSTAR transition function 261
5.11a LSTAR and random walk, ψ = 0.75 262
5.11b LSTAR transition function, ψ = 0.75 262
5.12a LSTAR and random walk, ψ = 6.0 263
5.12b LSTAR transition function, ψ = 6.0 263
5.13a Bounded random walk interval, α1 = 3, α2 = 3 267
5.13b Bounded and unbounded random walks, α1 = 3, α2 = 3 267
5.14a Bounded random walk interval, α1 = 5, α2 = 5 268
5.14b Bounded and unbounded random walks, α1 = 5, α2 = 5 268
5.15 Inf-t statistics for a unit root 273
5.16a ESTAR power functions, γ1 = −1, σε = 1 276
5.16b ESTAR power functions, γ1 = −0.1, σε = 1 276
5.17a ESTAR power functions, γ1 = −1, σε = 0.4 277
5.17b ESTAR power functions, γ1 = −0.1, σε = 0.4 277

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxiv — #24

List of Figures xxv

5.18a ESTAR power functions, θ = 0.02, σε = 1 278

5.18b ESTAR power functions, θ = 1, σε = 1 278
5.19a ESTAR power functions, θ = 0.02, σε = 0.4 279
5.19b ESTAR power functions, θ = 1, σε = 0.4 279
5.20 Power of unit root tests when DGP is BRW 282
5.21 Admissible proportion of estimates 305
5.22 Mean of estimates of θ 306
5.23 rmse of estimates of θ 307
Mean of estimates of φ1 as θ varies

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
5.24 308
5.25 rmse of estimates of φ1 as θ varies 309
5.26 Mean of estimates of γ1 as σε and θ vary 310
5.27 rmse of estimates of γ1 as σε and θ vary 311
5.28 Distribution of bias of estimates of α0 and α1 , symmetry imposed 312
5.29 Distribution of bias of estimates of α0 and α1 , symmetry not
imposed 312
5.30 QQ plots of NLS estimates of bias of α1 313
5.31 Distribution of the LR test statistic for symmetry 315
5.32 QQ plot of LR test statistic for symmetry against χ2 (1) 315
A5.1 Plot of quantiles for STAR PSE test statistics 324
6.1 Dollar–sterling real exchange rate (logs) 362
6.2 ESTAR(2): Residual sum of squares 364
6.3 Two-scale ﬁgure of real $: £ rate and β̂t 367
7.1 F test of AO restriction 407
7.2 F test of IO restriction 407
7.3 Selection by AIC/BIC (relative to DO) 408
7.4 Selection by max-LL, AO and IO models 409
7.5 Power of DF test: Model 1 level break 426
7.6 Power of DF test: Model 1a level break 427
7.7 Asymptotic n-bias: Models 2 and 3 428
7.8 Power of DF test: Model 2 slope break 429
7.9 Power of DF test: Model 3 slope break 429
8.1 Model 1 correct 466
8.2 Model 2 correct 467
8.3 Model 3 correct 468
8.4 Model 2 correct: ρ = 0, ρ = 0.9 470
8.5 US GNP (log and growth rate) 475
8.6a Sup-W test, null: no break in intercept 476
8.6b Sup-W test, null: no break in intercept + slope 476
8.6c Sup-W test, null: no break in slope 477
8.6d ZM Sup-W test: unit root and no breaks 477
8.7a AO Model 2: break date varies 481
8.7b IO Model 1: break date varies 481
8.8a US GNP, split trend, AO Model 2, break = 1945 485
8.8b US GNP, split trend, AO Model 2, break = 1938 485

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxv — #25

xxvi List of Figures

8.9a US GNP, split trend, IO Model 1, break = 1937 486

8.9b US GNP, split trend, IO Model 2, break = 1937 486
8.10a IO Model(2, 3): BIC, ﬁrst and second breaks 490
8.10b IO Model(2, 3): BIC, 3-D view 490
(2,3)
8.11a IO Model(2, 3): τ̃γ , ﬁrst and second breaks 491
(2,3)
8.11b IO Model(2, 3): τ̃γ , 3-D view 491
9.1 Illustrative stochastic random walks 501
9.2 Power of τ̂-type tests in the stochastic unit root model 502

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
9.3 Simulated time series with GARCH(1, 1) errors 508
9.4 US personal savings ratio (%) 523

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxvi — #26

Symbols and Abbreviations
(for Volumes 1 and 2)

τ̂ standard DF t-type test statistic, with no deterministic terms

τ̂μ standard DF t-type test statistic using demeaned data
τ̂β

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
standard DF t-type test statistic using detrended data
δ̂ ≡ T(ρ̂ − 1), where ρ̂ is the least squares (LS) estimator
δ̂μ as in the specification of τ̂μ
δ̂β as in the specification of τ̂β
⇒D convergence in distribution (weak convergence)
→p convergence in probability
→ tends to, for example ε tends to zero, ε → 0
→ mapping
⇒ implies
∼ is distributed as (or, from the context, left and right hand sides
approach equality)
≡ definitional equality
= not equals
≈ approximately
(z) the cumulative distribution function of the standard normal
distribution
the set of real numbers; the real line (–∞ to ∞)
+ the positive half of the real line
N+ the set of nonnegative integers
εt white noise unless explicitly excepted
n
xj the product of xj , j = 1, …, n
j=1
n
j=1 xj the sum of xj , j = 1, …, n
0 + approach zero from above
−0 approach zero from below
L the lag operator, Lj yt ≡ yt−j
the first difference operator, ≡ (1 – L)
s the s-th difference operator, s ≡ (1 − Lj )
s the s-th multiple of the first difference operator, s ≡ (1 − L)s
⊂ a proper subset of
⊆ a subset of
∩ intersection of sets
∪ union of sets
∈ an element of

xxvii

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxvii — #27

xxviii Symbols and Abbreviations (for Volumes 1 and 2)

|a| the absolute value (modulus) of a

iid independent and identically distributed
m.d.s martingale difference sequence
N(0, 1) the standard normal distribution, with zero mean and unit variance
niid independent and identically normally distributed
B(t) standard Brownian motion, that is with unit variance
W(t) nonstandard Brownian motion

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxviii — #28

Preface

This book is the second volume of Unit Root Tests in Time Series and is subtitled
‘Extensions and Developments’. The ﬁrst volume was published in 2011 (Patter-

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
son, 2011), and subtitled ‘Key Concepts and Problems’ (referred to herein as UR,
Vol. 1). Additionally, a third contribution, although the first in the sequence,
entitled A Primer for Unit Root Testing (referred to herein as A Primer), was pub-
lished in 2010 (Patterson, 2010), and completes the set. The books can be read
independently depending on the reader’s background and interests.
I conceived this project around ten years ago, recognising a need to present
and critically assess the key developments in unit root testing, a topic that barely
twenty years before was only occasionally referenced in the econometrics liter-
ature, although its importance had been recognised long before, for example
in the modelling strategy associated with Box and Jenkins (1970). (See Mills
(2011), for an excellent overview in the context of the development of modern
time series analysis.)
An econometric or statistical procedure can be judged to be influential when
it reaches into undergraduate courses as well as becoming standard practice in
research papers and articles, and no serious empirical analysis of time series is
now presented without reporting seemingly obligatory unit root tests. However,
in the course of becoming standard practice many of the nuances and concerns
about the unit testing framework, carefully detailed in the original research, may
be overlooked, the job being done if the unit root ‘tick box’ has been checked.
Undergraduate dissertations and projects, as well as PhD theses, involving time
series, usually report Dickey-Fuller (DF) statistics or perhaps, depending on the
available software, tests in the form due to Elliott, Rothenberg and Stock (1996),
the ‘ERS’ tests. However, it can be the case that the analysis of the time series
properties of the data is somewhat superficial. Faced with the task of analysing
and modelling time series, I am struck by how careful one has to be and how
many possibilities there are that complicate the question of interpreting the out-
come of a unit root test. This is the starting point of this book that distinguishes
it from Volume 1.
When considering what to include here I took the view that the topics must
address the development of the standard unit root testing framework to situ-
ations that were likely to occur in practice. Moreover, whilst an overview of
as many problems as possible would have been one organising principle, this
approach would risk taking a ‘menu’-type solution to potential problems and,
instead, it would be preferable to address in detail some developments that have

xxix

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxix — #29

xxx Preface

been particularly important in shaping the direction of research and application.

This, then, was the guiding principle for organising the chapters in this volume
although, having concluded this contribution as far as time will allow, I am as
aware now as at the start of the project that there still remains so much of interest
untold here. Whilst the testing procedure for a unit root seems straightforward,
indeed simple enough, as noted, to be taught in an introductory econometrics or
statistics course, some important problems have come to light, and the explicit
dichotomy (unit root or no unit root) in standard tests seems a questionable

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
simplicity.
Although much of the unit root framework has been developed in the econo-
metrics literature, the presence of key articles in general statistical journals (for
example, The Annals of Statistics, Biometrika, The Journal of the Royal Statistical
Society, The Journal of the American Statistical Association and so on) indicates
the wider interest and wider importance of the topic. Moreover, the tech-
niques have been finding interesting applications outside economics, showing
just how pervasive is the basic underlying problem. To give some illustrations,
Stern and Kaufman (1999) analysed a number of time series related to global
climate change, including temperatures for the northern and southern hemi-
spheres, carbon dioxide, nitrous oxide and sulphate aerosol emissions. Wang
et al. (2005) analysed the streamflow processes of 12 rivers in western Europe
for nonstationarity for the twentieth century, streamflow being an issue of con-
cern in the design of a flood protection system, a hydrological connection that
includes Hurst (1951) and Hosking (1984). Aspects of social behaviour have been
the subject of study, such as the nature of the gender gap in drink-driving (see
Schwartz and Rookey, 2008).
I noted in Volume 1 that the research and literature on this topic has grown
almost exponentially since Nelson and Plosser’s (N&P) seminal article published
in 1982. N&P applied the framework due to Dickey (1976) and Fuller (1976) to
testing for a unit root in a number of macroeconomic time series. Whilst the
problem of modelling nonstationary series was known well before 1982, and
was routinely taken into account in the Box-Jenkins methodology, the focus of
the N&P study was on the stochastic nature of the trend and its implications
for economic policy and the interpretation of ‘shocks’; this led to interest in the
properties of economic time series as worthy of study in itself.
I also noted in Volume 1 that articles on unit root tests are amongst the most
cited in economics and econometrics and have influenced the direction of eco-
nomic research at a much wider level. A citation summary for articles based
on univariate processes was included therein and showed that there has been a
sustained interest in the topic over the last thirty years. Out of interest I have
updated this summary and calculated the implied annual growth rate, with the
‘top 5’ on a citations basis presented in Table P1. The rate of growth is quite
astonishing and demonstrates the continuing interest in the topic of unit roots;

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxx — #30

Preface xxxi

Table P1 Number of citations of key articles on unit roots

Author(s) February 2010 July 2011 p.a. growth

Dickey and 7,601 8,933 12.1%

Fuller (1979)
Phillips and 4,785 5,673 12.8%
Perron (1988)
Dickey and 4,676 5,368 10.8%
Fuller (1981)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Perron (1989) 3,371 3,932 11.5%
KPSS (1992) 3,280 3,996 15.0%

Note: KPSS is Kwiakowski, Phillips, Schmidt and Shin (1992).

for example, there have been 1,300 more citations of the seminal Dickey and
Fuller (1979) article in less than 18 months.
As in the case of Volume 1, appropriate prerequisites for this book include
some knowledge of econometric theory at an intermediate or graduate level as,
for example, in Davidson (2000), Davidson and Mackinnon (2004) or Greene
(2011); also some exposure to the basic DF framework for testing for a unit
root would be helpful, but this is now included in most introductory courses in
econometrics (see, for example, Dougherty, 2011).
The results of a number of Monte Carlo studies are reported in various
chapters, reﬂecting my view that simulation is a key tool in providing guidance
on ﬁnite sample issues. Many more simulations were run than are reported in
the various chapters, with the results typically illustrated for one or two sample
sizes where they are representative of a wider range of sample sizes.

Organisation of the book

The contents are organised into topics as follows.

Chapter 1: Introduction
First, the opportunity is taken to offer a brief reminder of some key concepts
that underlie the implicit language of unit root tests. In addition there is also an
introduction to an econometric problem that occurs in several contexts in test-
ing for a unit root and is referenced in later chapters. The problem is that of the
presence of a ‘nuisance’ parameter which is only present under the alternative
hypothesis, sometimes referred to as the ‘Davies problem’. The problem and its
solution occur in several other areas of econometrics and it originates, at least
in its econometric interest, in application of the well-known Chow test (Chow,
1960) to the temporal stability of a regression model, but in the case that the
temporal point of instability is unknown.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxxi — #31

xxxii Preface

Chapter 2: Functional form and nonparametric tests for a unit root

There are two related issues covered in this chapter. The ﬁrst is whether a unit
root, if present, should be modelled in the original series or in the logarithm of
the series, the difference being between random walk and an exponential ran-
dom walk. Often the choice seems either arbitrary or a matter of convenience,
whereas the choice can be vital to the outcome of a unit root test. The second
issue is a contrast in the nature of the tests to be considered compared to those

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
included in UR, Vol. 1. The tests in the latter were largely parametric in the
sense that they were concerned with estimation in the context of an assumed
parametric structure, usually in the form of an AR or ARMA model, the slight
exception being in estimating the long-run variance based on a semi-parametric
procedure. In this volume nonparametric tests are considered in greater detail,
where such tests use information in the data, such as ranks, signs and runs, that
do not require it to be structured by a model.

Chapter 3: Fractional integration

A considerable growth area in the development of techniques for nonstationary
time series has been the extension to non-integer – or fractional – integration, a
concept that is intimately tied to the idea of long memory. The basic idea has a
long history, for example an influential paper by Hurst (1951) on the long-run
capacity of reservoirs, introduced the Hurst exponent, which is related to the
degree of fractional differencing necessary to ensure stationarity. As is the case
with so many ideas in the development of econometrics, a paper by Granger,
in this case co-authored with Joyeux (Granger and Joyeux, 1980), was seminal
in bringing a number of essential concepts to the attention of econometricians.
The basic idea is simple: consider an I(d) process that generates a series of obser-
vations on yt , where d is the order of integration (the minimum number of
differences necessary to ensure that yt becomes stationary), then there is no
need for d to be an integer. The standard DF test that assumes that d is either
one (contains a single unit root) or zero (is stationary). There are two central
aims in Chapter 3. The first is to lay out the I(d) framework, defining a process
that is fractionally integrated and indicating how such processes may be gener-
ated; the second is to consider extensions of the unit root testing approach to
tests for a fractional unit root.

Chapter 4: Semi-parametric estimation of the long-memory parameter

This chapter deals with the separate but related problem of estimating the long-
memory parameter. The methods considered in this chapter are semi-parametric
in the sense of not involving a complete parametric speciﬁcation, in contrast
to the extension of the autoregressive integrated moving average to the frac-
tional case (ARFIMA models), which was considered in Chapter 3. Typically such
methods either leave the short-run dynamic part of the model unspeciﬁed, or

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxxii — #32

Preface xxxiii

do not use all of the frequency range in estimating the long-memory parameter.
Research on these methods has been a considerable growth area in the last ten
years or so, leading to a potentially bewildering number of estimation methods.

Chapter 5: Smooth transition nonlinear models

The application of random walk models to some economic time series can be
inappropriate when there are natural bounds to a series. For example, unemploy-
ment rates are bounded by zero and one, and negative nominal interest rates are

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
not observed. There are a number of models that allow unit root-type behaviour,
but take into account either the limits set by the data range or the economic
mechanisms that are likely to prevent unbounded random walks in the data.
These models have in common that they involve some form of nonlinearity.
The models considered in this chapter involve a smooth form of nonlinearity,
including smooth transition autoregressions based on the familiar AR model,
allowing the AR coefﬁcients to change as a function of the deviation of an
‘activating’ or ‘exciting’ variable from its target or threshold value. A second
form of nonlinear model is the bounded random walk, referred to as a BRW,
which allows random walk behaviour to be contained by bounds, or buffers,
that ‘bounce’ the variable back if it is getting out of control.

Chapter 6: Threshold autoregressions

Another form of nonlinearity is when two or more linear AR models are dis-
cretely ‘pieced’ together, where each piece is referred to as a regime, and the
indicator as to which regime is operating is based on the idea of a threshold
parameter or threshold variable. This class of models is known as Threshold AR,
or TAR, models. When the regime indicator is itself related to the dependent
variable, TAR models are referred to as self-exciting, that is SETAR. These models
are naturally extended to more than two regimes. In some ways these models
share the kind of structural instability for which the well known Chow tests were
designed; however, in contrast to the Chow case considered in calendar time,
the regimes are not constrained to be composed of adjacent observations. The
particular complication considered in this chapter allows a unit root in one or
more of the regimes, so that not only may the component models be piecewise
linear, they may be piecewise nonstationary.

Chapter 7: Structural breaks in AR models

Chapter 7 deals with a form of structural instability related to the passage of
time. Perron’s (1989) seminal article began an important reinterpretation of
results from the unit root literature. What if, instead of a unit root process gen-
erating the data, there was a trend subject to a break due to ‘exceptional’ events?
How would standard unit root tests perform, for example, what would be their
power characteristics if the break was ignored in the alternative hypothesis?

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxxiii — #33

xxxiv Preface

The idea that regime change, rather than a unit root, was the cause of nonsta-
tionarity led to a fundamental re-evaluation of the simplicity of the dichotomy
of ‘opposing’ the mechanisms of a unit root process on the one hand and a
trend stationary process on the other. Examples of events momentous enough
to be considered as structural breaks include the Great Depression (or Crash),
the Second World War, the 1973 OPEC oil crisis and more recently ‘9/11’, the
ﬁnancial crisis (the ‘credit crunch’) of 2008 and the government debt problem
in the Euro area. Under the alternative hypothesis of stationarity, trended time

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
series affected by such events could be modelled as stationary around a broken
or split trend. Chapter 7 is primarily concerned with the development of an
appropriate modelling framework for assessing the impact of a possible trend or
mean break at a known break date, which is necessary to understand the more
realistic cases dealt with in Chapter 8.

Chapter 8: Structural breaks with unknown break dates

In practice, although there are likely to be some contemporaneous and, later,
historical indications of regime changes, there is almost inevitably likely to be
uncertainty about not only the dating of such changes but also the nature of
the changes. This poses another set of problems for econometric applications.
If a break is presumed, when did it occur? Which model captures the nature
of the break? If multiple breaks occurred, when did they occur? These are the
concerns of Chapter 8.

Chapter 9: Conditional heteroscedasticity and unit root tests

Conditional heteroscedasticity is an important area of research and application
in econometrics, especially since Engle’s seminal article in 1982, which led to
a voluminous literature that is especially relevant to modelling financial mar-
kets and where there are volatility ‘clusters’, such as in commodity prices. This
chapter starts with an extension of the standard AR modelling framework that
allows stochastic rather than deterministic coefficicients; in context this means
that a unit root, if it is present, may be stochastic rather than fixed (referred to
as StUR). A model of this form gives rise to heteroscedasticity that is linked to
the lagged dependent variable, which is a scale variable in the heteroscedas-
ticity function. The chapter then covers the impact of ARCH/GARCH-type
heteroscedasticity on unit root tests. The two types of conditional heteroscedas-
ticity have different implications for conventional DF-type test statistics for a
unit root.
The graphs in this book were prepared with MATLAB, www.mathworks.co.uk,
which was also used, together with TSP (www.tspintl.com) and RATS
(www.estima.com), for the programs written for this book. Martinez and Mar-
tinez (2002) again provided an invaluable guide to the use of MATLAB, and time

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxxiv — #34

Preface xxxv

and time again Hanselman and Littleﬁeld (2005) was an essential and highly
recommended companion to the use of MATLAB.
I would like to thank my publishers Palgrave Macmillan, and Taiba Batool
and Ellie Shillito in particular, for their continued support not only for this
book but in commissioning the series of which it is a part, namely ‘Palgrave
Texts in Econometrics’, and more recently extending the concept in a practical
way to a new series, ‘Palgrave Advanced Texts in Econometrics’, inaugurated by
publication of The Foundations of Time Series Analysis (Mills, 2011). These two

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
series have, at the time of writing, led to eight books, with more in press and,
combined with the Handbook of Econometrics published in two volumes (Mills
and Patterson, 2006, 2009), have made an important contribution to scholarly
work and educational facilities in econometrics. Moreover, as noted above, the
increasing ubiquity of modern econometric techniques means that the ‘reach’
of these contributions extends into non-economic areas, such as climatology,
geography, meteorology, sociology, political science and hydrology.
In the earlier stages of preparation of the manuscript and ﬁgures for this book
(as in the preparation of other books) I had the advantage of the willing and able
help of Lorna Eames, my secretary in the School of Economics at the University
of Reading, to whom I am grateful. Lorna Eames retired in April 2011. Those who
have struggled with the many tasks involved in the preparation of a mansuscript,
particularly one of some length, will know the importance of having a reliable
and willing aide; Lorna was exceptionally that person.
If you have comments on any aspects of the book, please contact me at my
email address given below.

[email protected]

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_01_PREXXIV” — 2012/5/23 — 10:37 — PAGE xxxv — #35

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
1
Some Common Themes

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Introduction

The first four sections of this chapter review some basic concepts that, in part,
serve to establish a common notation and define some underlying concepts for
later chapters. In particular Section 1.1 considers the nature of a stochastic pro-
cess, which is the conceptualisation underlying the generation of time series
data. Section 1.2 considers stationarity in its strict and weak forms. As in Vol-
ume 1, the parametric models fitted to univariate time series are often variants
of ARIMA models, so these are briefly reviewed in Section 1.3 and the concept of
the long-run variance, which is often a key component of unit root tests where
there is weak dependency in the errors, is outlined in Section 1.4. Section 1.5 is
more substantive and considers, by means of some simple illustrations, a prob-
lem and its solution that arises in Chapters 5, 6 and 8. The problem is that of
designing a test statistic when there is a parameter that is not identified under
the null hypothesis. The two cases considered here are variants of the Chow test
(Chow, 1960), which tests for stability in a regression model. This section also
considers how to devise a routine to obtain the bootstrap distribution of a test
statistic and the bootstrap p-value of a sample test statistic.

1.1 Stochastic processes

An observed time series is a sample path or realisation of a sequence of random

variables over an interval of time. Critical to understanding how such sample
paths arise is the concept of a stochastic process, which involves a probability
space deﬁned over time. For simplicity we will consider stochastic processes
deﬁned in discrete time, so that the evolving index is t, t = 1, . . . , T, a subset of
the positive integers N+ . Denote the random variables that are the components
of the stochastic process as εt (ω); t is the sample space of the random variable
εt and ω ∈ t indicates that ω is an element of t . (Note that ω is here used

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 1 — #1

2 Unit Root Tests in Time Series Volume 2

generically to denote an element of a sample space, the notation for a random

variable is indicative, for example in particular applications it may be yt (ω),
xt (ω) and so on and the sample space will be appropriately distinguished.)
In the case of a stochastic process, the sample space is not just the space of a
single random variable but the sample space of the sequence of random vari-
ables of length T. When there is no dependence between the random variables
εt and εs for t = s and all s, the sequence sample space is particularly easy
to determine in terms of the underlying sample spaces t , t = 1, . . . , T and an

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
example is given below.
A discrete-time stochastic process, ε(t), with T ⊆ N + may be summarised as:

ε(t) = (εt (ω) : t ∈ T ⊆ N + , ω ∈ ) (1.1)

A particular realisation of εt (ω) is a sample ‘observation’, that is, a single number

– a point on the overall sample path. Varying the element ω ∈ , whilst keeping
time constant, t = s, gives a distribution of outcomes at that point and so a
‘slice’ through ε(t) at time t = s. On the other hand, when choosing a particular
sequence from ω ∈ , εt (ω) can then be regarded as a function of time, t ∈ T , and
an ‘outcome’ is a complete sample path, t = 1, . . . , T. By varying ω ∈ , a different
sample path is obtained, so that there are (potentially) different realisations for
all t ∈ T .
Often the reference to ω ∈ is suppressed and a single random variable in the
stochastic process is written εt , but the underlying dependence on the sample
space should be recognised and means that different ω ∈ give rise to potentially
different sample paths.
To illustrate the concepts consider a very simple example. In this case the
random variable x arises from the single toss of a coin at time t, say xt ; this
has the sample space x1 = {H, T } where italicised H and T refer to the coin
landing heads or tails, respectively. Next suppose that T tosses of a coin take
place sequentially in time, then the (sequence) sample space x is of dimension
2T and ω, the generic element of x , refers to a T-dimensional ordered sequence.
Assume that the coin tosses are independent, then the sample space x is just the
product space of the individual sample spaces, x = x1 × x1 × . . . × x1 = (x1 )T
(the × symbol indicates the Cartesian product). By ﬁxing ω ∈ x , where ω is a
sequence, a whole path is determined, not just a single element at one point in
time; thus as ω is varied, the whole sample path is varied, at least potentially.
For example, suppose in the coin tossing experiment T = 3, then the sample
space x for three consecutive tosses of a coin is x = {(HHH), (HHT ), (HTH),
(HTT ), (TTT ), (TTH), (THT ), (THH)}, which comprises 8 = 23 elements, each
one of which is a sample path.
Now map the outcomes of the coin toss into a (derived) random variable
ε with outcomes +1 and –1; the sample space of ε(t) for T = 3 is then ε =
{(1)(1)(1), (1)(1)(–1), (1)(–1)(1), (1)(–1)(–1), (–1)(–1)(–1), (–1)(–1)(1), (–1)(1)(–1),

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 2 — #2

Some Common Themes 3

(–1)(1)(1)}, again each of the 8 components is a sequence of T = 3 elements

and each one of these ordered sequences is a sample path for t = 1, 2, 3; for
example, the realisation {1, –1, 1} is the path where the coin landed H, then T
and finally H. Since by assumption H and T are equally likely the probability of
each sample path is 1/23 = 1/8.
Whilst this example is rather trivial, an interesting stochastic process can be

derived from the coin-tossing game. Define the random variable yt = tj=1 εj =
yt−1 + εt , which further defines another stochastic process, say Y(t) in terms of

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ε(t). If T = 3, then the sample space of yt , y , is simply related to ε , specifically
y = {(1)(2)(3), (1)(2)(1), (1)(0)(1), (1)(0)(–1), (–1)(–2)(–3)), (–1)(–2)(1), (–1)(0)
(–1), (–1)(0)(1)}, and each ordered component of the sequence defines a sample
path. The resulting stochastic process is a binomial random walk and in this case,
as it has been assumed that the coin is fair, the random walk is symmetric. Whilst
the original stochastic process for εt is stationary, that for yt is not; the definition
of stationarity is considered in the next section. A simple generalisation of this
partial sum process is to generate the εt as draws from a normally distributed
random variable, giving rise to a Gaussian random walk.
Stochastic processes arise in many different applications. For example, pro-
cesses involving discrete non-negative numbers of arrivals or events are often
modelled as a Poisson process, an illustrative example being the number of
shoppers at a supermarket checkout in a given interval (for an illustration see
A Primer, chapter 3). A base model for returns on financial assets is that they
follow a random walk, possibly generated by a ‘fat-tailed’ distribution with a
higher probability of extreme events compared to a random walk driven by
Gaussian inputs. To illustrate, a partial sum process was defined as in the case of
the binomial random walk with the exception that the ‘inputs’ εt were drawn
from a Cauchy distribution (for which neither the mean nor the second moment
exist), so that there was a greater chance of ‘outliers’ compared to draws from
a normal distribution. The fat tails allow the simulations to mimic crisis events
such as the events following 9/11 and the 2008 credit crunch. Figure 1.1 shows
4 of the resulting sample paths over t = 1, . . . , 200. The impact of the fat tails
is evident and, as large positive and large negative shocks can and do occur, the
sample paths (or trajectories) show some substantial dips and climbs over the
sample period.

1.2 Stationarity and some of its implications

Stationarity captures the idea that certain properties of a random or stochastic

process generating the data are unchanging. If the process does not change at
all over time, so that the underlying probability space (see A Primer, chapter 1)
is constant, it does not matter which sample portion of observations are used
to estimate the parameters of the process and we may as well, to improve the

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 3 — #3

4 Unit Root Tests in Time Series Volume 2

400

300

200

100
yt
0

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
–100

–200

–300
0 20 40 60 80 100 120 140 160 180 200
t

Figure 1.1 Illustrative fat-tailed random walks

efﬁciency of estimation and inference, use all available observations. However,

stationarity comes in different ‘strengths’, starting with a strictly or completely
stationary process. A stochastic process that is not stationary is said to be
nonstationary.

1.2.1 A strictly stationary process

Let τ = s and T be arbitrary, if Y(t) is a strictly stationary, discrete-time stochastic
process for a discrete random variable, yt , then:

P(yτ+1 , yτ+2 , . . . , yτ+T ) = P(ys+1 , ys+2 , . . . , ys+T ) (1.2)

where P(.) is the joint probability mass function (pmf) for the random variables
enclosed in (.). Strict stationarity requires that the joint pmf for the sequence of
random variables of length T starting at time τ + 1 is the same for any shift in the
time index from τ to s and for any choice of T. This means that it does not matter
which T-length portion of the sequence is observed, each has been generated by
the same unchanging probability structure. A special case of this result in the
discrete case is that for T = 1 where P(yτ ) = P(ys ), so that the marginal pmfs must
also be the same for τ = s implying that E(yτ ) = E(ys ). These results imply that
other moments, including joint moments such as the covariances, are invariant
to arbitrary time shifts.

1.2.2 Stationarity up to order m

A weaker form of stationarity is where the moments up to order m are
the same when comparing the (arbitrary) sequences (yτ+1 , yτ+2 , . . . , yτ+T ) and
(ys+1 , ys+2 , . . . , ys+T ). The following deﬁnition is adapted from Priestley (1981,
Deﬁnition 3.3.2):

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 4 — #4

Some Common Themes 5

the process Y(t) is said to be stationary of order m if the joint moments up to

order m of (yτ+1 , yτ+2 , . . . , yτ+T ) exist and are equal to the corresponding joint
moments for (ys+1 , ys+2 , . . . , ys+T ) for arbitrary τ, s and T.
This deﬁnition implies that if a process is stationary of order m then it is also
stationary of order m – 1. The case most often used is m = 2, referred to as 2nd
order, weak or covariance stationarity. It implies the following three conditions
(the ﬁrst of which is due to m = 1):
if a process is stationary of order 2 then for arbitrary τ and s, τ = s:

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
E(yτ ) = E(ys ) = µy ⇒ the mean of the process is constant
var(yτ ) = var(ys ) = σy2 ⇒ the variance of the process is constant
cov(yτ , yτ+k ) = cov(ys , ys+k ) ⇒ the autocovariances are invariant to
translation for all k

Ross (2003) gives examples of processes that are weakly stationary but not strictly
stationary; also, a process could be strictly stationary, but not weakly stationary
by virtue of the non-existence of its moments. For example, a random process
where the components have unchanging marginal and joint Cauchy distribu-
tions will be strictly stationary, but not weakly stationary because the moments
do not exist.
The leading case of nonstationarity, at least in econometric terms, is that
induced by a unit root in the AR polynomial of an ARMA model for yt , which
implies that the variance of yt is not constant over time and that the k-th order
autocovariance of yt depends on t. A process that is trend stationary is one that is
stationary after removal of the trend. A case that occurs in this context is where
the process generates data as yt = β0 + β1 t + εt , where εt is a zero mean weakly
stationary process. This is not stationary of order one because E(yt ) = β0 + β1 t,
which is not invariant to t. However, if the trend component is removed either
as E(yt − β1 t) = β0 or E(yt − β0 − β1 t) = 0, the result is a stationary process.

Returning to the partial sum process yt = tj=1 εj , with εt a binomial random
variable, note that E(εt ) = 0, E(ε2t ) = σε2 = 1 and E(εt εs ) = 0 for t = s. However, in

the case of yt , whilst E(yt ) = tj=1 E(εj ) = 0, the variance of yt is not constant as
t varies and E(y2t ) = tσε2 = t. Moreover, the autocovariances are neither equal to
zero nor invariant to a translation in time; for example, consider the ﬁrst-order
autocovariances cov(y1 , y2 ) and cov(y2 , y3 ), which are one period apart, then:

cov(y1 , y2 ) = σε2 + cov(ε1 , ε2 ) (1.3a)

= σε2
cov(y2 , y3 ) = 2σε2 + 2cov(ε1 , ε2 ) + cov(ε1 , ε3 ) + cov(ε2 , ε3 )
= 2σε2 (1.3b)

The general result is that cov(yt , yt+1 ) = tσε2 , so the partial sum process generat-
ing yt is not covariance stationary. The assumption that εt a binomial random

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 5 — #5

6 Unit Root Tests in Time Series Volume 2

variable is not material to this argument, which generalises to any process with
E(ε2t ) = σε2 and E(εt εs ) = 0, such as εt ∼ N(0, σε2 ).

1.3 ARMA(p, q) models

ARMA (autoregressive, moving average) models form an important ‘default’

model in parametric models involving a unit root, and the notation of such
models is established here for use in later chapters. An ARMA model of order p

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
in the AR component and q in the MA component for the univariate process
generating yt , assuming that E(yt ) = 0, is written as follows:

φ(L)yt = θ(L)εt (1.4)

2 p
φ(L) = (1 − φ1 L − φ2 L − . . . − φp L ) AR(p) polynomial (1.5a)
2 q
θ(L) = (1 + θ1 L + θ2 L + . . . + θq L ) MA(q) polynomial (1.5b)

L is the lag operator, sometimes referred to as the backshift operator, usu-

ally denoted B, in the statistics literature, and defined as Lj yt ≡ yt−j , (see UR,
Vol. 1, Appendix 2). The sequence {εt }Tt=1 comprises the random variables εt ,
t = 1, . . . ,T, assumed to be independently and identically distributed for all
t, with zero mean and constant variance, σε2 , written as εt ∼ iid(0, σε2 ); if the
(identical) distribution is normal, then εt ∼ niid(0, σε2 ).
Note that the respective sums of the coefficients in φ(L) and θ(L) are obtained
by setting L = 1 in (1.5a) and (1.5b), respectively. That is:
p
φ(L = 1) = 1 − φi sum of AR(p) coefficients (1.6a)
i=1
q
θ(L = 1) = 1 + θj sum of MA(q) coefficients (1.6b)
j=1

For economy of notation it is conventional to use, for example, φ(1) rather than
φ(L = 1). By taking the AR terms to the right-hand side, the ARMA(p, q) model
is written as:

yt = φ1 yt−1 + . . . + φp yt−p + εt + θ1 εt−1 + . . . + θq εt−q (1.7)

Equation (1.4) assumed that E(yt ) = 0, if E(yt ) = µt = 0, then yt is replaced by

yt − µt , and the ARMA model is written as:

φ(L)(yt − µt ) = θ(L)εt (1.8)

The term µt has the interpretation of a trend function, the simplest and most
frequently occurring cases being where yt has a constant mean, so that µt = µ,
and yt has a linear trend, so that µt = β0 + β1 t.
The ARMA model can then be written in deviations form by ﬁrst deﬁning
ỹt ≡ yt − µt , with the interpretation that ỹt is the detrended (or demeaned) data,

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 6 — #6

Some Common Themes 7

so that (1.8) becomes:

φ(L)ỹt = θ(L)εt (1.9)

In practice, µt is unknown and a consistent estimator, say µ̂t , replaces µt , so that

the estimated detrended data is ỹˆ t = yt − µ̂t . Typically, in the constant mean case,

the ‘global’ mean ȳ = T−1 Tt=1 yt is used for µ̂t , and in the linear trend case β0
and β1 are replaced by their LS estimators (denoted by ˆ over) from the prior
regression of yt on a constant and a time trend, so that µ̂t = β̂0 + β̂1 t.

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
With µ̂t replacing µt , (1.9) becomes:

φ(L)ỹˆ t = θ(L)εt + ξt (1.10)

where ξt = φ(L)(µt − µ̂t ).

An alternative to writing the model as in (1.8) is to take the deterministic
terms to the right-hand side, so that:

φ(L)yt = µ∗t + θ(L)εt (1.11)

where, for consistency with (1.8), µ∗t = φ(L)µt . For example, in the constant
mean case and linear trend cases µ∗t is given, respectively, by:

µ∗t = µ∗
= φ(1)µ (1.12a)
µ∗t = φ(L)(β0 + β1 t) (1.12b)
= φ(1)β0 + β1 φ(L)t
= β∗0 + β∗1 t
p p
where β∗0 = φ(1)β0 + β1 j=1 jφj , β∗1 = φ(1)β1 and φ(1) = 1 − j=1 φj .
An equivalent representation of the ARMA(p, q) model is to factor out the
dominant root, so that φ(L) = (1 − ρL)ϕ(L), where ϕ(L) is invertible, and the
resulting model is speciﬁed as follows:

(1 − ρL)yt = zt (1.13a)
ϕ(L)zt = θ(L)εt (1.13b)
zt = ϕ(L)θ(L)εt (1.13c)

This error dynamics approach (see UR, Vol. 1, chapter 3) has the advantage of
isolating all dynamics apart from that which might be associated with a unit
root (ρ = 1 in this case) into the error zt .

1.4 The long-run variance

An important concept in unit root tests is the long-run variance. It is one of three
variances that can be deﬁned when yt is determined dynamically, depending on

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 7 — #7

8 Unit Root Tests in Time Series Volume 2

the extent of conditioning in the variance. This is best illustrated with a simple
example and then generalised. Consider an AR(1) model:

yt = ρyt−1 + εt |ρ| < 1 (1.14)

Then the three variances are distinguished as follows.

1. Conditional variance: var(yt |yt−1 ) = σε2 , keeps everything but yt constant.

2. Unconditional variance, σy2 ≡ var(yt ), allows variation in yt−1 to be taken into

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
account:

σy2 = var(ρyt−1 + εt )

= ρ2 var(yt−1 ) + σε2 + 2ρcov(yt−1 εt )

= ρ2 σy2 + σε2 cov(yt−1 εt ) = 0

= (1 − ρ2 )−1 σε2 (1.15)

3. Long-run variance, σy,lr

2 : allows variation in all lags of y to be taken into
t
account:

yt = ρyt−1 + εt
= (1 − ρL)−1 εt
∞
= ρi εt−i moving average form (1.16)
i=0
∞
2
σy,lr = var ρi εt−i
i=0
∞
2i 2
= ρ σε because cov(εt−i εt−j ) = 0 for i = j
i=0

= (1 − ρ)−2 σε2 (1.17)

∞ ∞
The last line of (1.17) uses i=0 ρ2i = i=0 (ρ2 )i = (1−ρ)−2 for |ρ| < 1. Also note
that it suggests an easy way of obtaining σy,lr
2 : (1.16) is the MA representation

of the AR(1) model, with lag polynomial (1 − ρL)−1 and σy,lr

2 is the square of

this polynomial evaluated at L = 1, scaled by σε2 . Hence, in generalising this

result, it will be of importance to obtain the MA representation of the model,
which we now consider.

A linear model is described as being causal if there exists an absolutely

summable sequence of constants {wj }∞
0 , such that:
∞
yt = wj Lj εt
j=0

= w(L)εt (1.18)

If E(yt ) = µt = 0, where µt is a deterministic function, then yt is replaced by

yt − µt as in (1.8). The condition of absolute summability is ∞ j=0 |wj | < ∞. The

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 8 — #8

Some Common Themes 9

lag polynomial w(L) is the causal linear (MA) ﬁlter governing the response of
{yt } to {εt }.

In an ARMA model, the MA polynomial is w(L) = ∞ −1
j=0 wj L = φ(L) θ(L), with
j

ω0 = 1; for this representation to exist the roots of φ(L) must lie outside the unit
circle, so that φ(L)−1 is deﬁned. The long-run variance of yt , σy,lr
2 , is then just

the variance of ω(1)εt ; that is, σy,lr

2 = var[w(1)ε ] = w(1)2 σ2 .
t ε

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
1.5 The problem of a nuisance parameter only identiﬁed under
the alternative hypothesis

This section considers the general nature of a problem that occurs in different
developments of models with unit roots, the solution to which has applications
in later chapters, particularly Chapters 5, 6 and 8. The problem of interest is
sometimes referred to as the Davies problem, after Davies (1977, 1987), who
considered hypothesis testing when there is a nuisance parameter that is only
present under the alternative hypothesis. An archetypal case of such a problem
is where the null model includes a linear trend, which is hypothesised to change
at some particular point in the sample. If that breakpoint was known, obtain-
ing a test statistic would just require the application of standard principles, for
example an F test based on ﬁtting the model under the alternative hypothesis.
However, in practice, the breakpoint is unknown and one possibility is to search
over all possible breakpoints and calculate the test statistic for each possibility;
however, the result of this enumeration over all possibilities is that, the F test
for example, no longer has an F distribution.
To illustrate the problem and its solution, we consider a straightforward
problem of testing for structural stability in a bivariate regression with a non-
stochastic regressor and then in the context of a simple AR model. At this stage
the exposition does not emphasise the particular problems that arise if a unit
root is present (either under the null or the alternative). The aim here is to set
out a simple framework that is easily modiﬁed in later chapters.

1.5.1 Structural stability

Two models of structural instability are considered here to illustrate how to
deal with the problem that a parameter is only identiﬁed under the alternative
hypothesis, HA . These models have an important common element, that of
specifying two regimes under HA , but only one under H0 . They differ in the
speciﬁcation of the regime ‘separator’; in one case the regimes are separated
by the passage of time, the observations being generated by regime 1 up to a
switching (or break) point in time and thereafter they are generated by regime 2.
This is the case for which the well-known Chow test(s) were designed, (Chow,
1960). This kind of model may be considered for structural changes arising from

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 9 — #9

10 Unit Root Tests in Time Series Volume 2

discrete events, often institutional in nature, for example a change in the tax
regime or in political institutions.
In the second case there are again two regimes, but they are separated by an
endogenous indicator function, rather than the exogenous passage of time. The
typical case here is where the data sequence {yt }Tt=1 is generated by an AR(p)
model under H0 , whereas under HA the coefﬁcients of the AR(p) model change
depending on a function of lagged yt relative to a threshold κ. In both models the
nature of the change is quite limited. According to HA the basic structure of the

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
model is not changed, but the coefﬁcients are allowed to change. The common
technical problem of interest here is that typically the structural breakpoint or
threshold is not known, whereas standard tests assume that these are known.
Consider the standard problem of applying the Chow test for a structural
break. A simple bivariate model, with a nonstochastic regressor, is sufﬁcient to
illustrate the problem.

yt = α0 + α1 xt + εt t = 1, . . . , T (1.19)

where εt ∼ iid(0, σε2 ), that is {εt }Tt=1 is a sequence of independent and identically
distributed ‘shocks’ with zero mean and constant variance, σε2 ; and, along the
lines of an introductory approach, assume that xt is ﬁxed in repeated samples.
The problem is to assess whether the regression model has been ‘structurally
stable’ in the sense that the regression coefﬁcients α0 and α1 have been constant
over the period t = 1, . . . T, where for simplicity σε2 is assumed a constant. There
are many possible schemes for inconstancy, but one that is particularly simple
and has found favour is to split the overall period of T observations by intro-
ducing a breakpoint (in time) Tb , such that there are T1 observations in period 1
followed by T2 observations in period 2, thus T = T1 + T2 and t = 1, . . . , T1 , T1 +
1, . . . , T; note that Tb = T1 . Such a structural break is associated with a discrete
change related to external events.
According to this scheme the regression model may be written as:

yt = α0 + α1 xt + I(t > T1 )δ0 + I(t > T1 )δ1 xt + εt (1.20a)

yt = α0 + α1 xt + εt if 0 ≤ t ≤ T1 (1.20b)
yt = α0 + δ0 + (α1 + δ1 )xt + εt if T1 + 1 ≤ t ≤ T (1.20c)

where I(t > T1 ) = 1 if the condition in (.) is true and 0 otherwise. There will be
several uses of this indicator function in later chapters. The null hypothesis of
no change is H0 : δ0 = δ1 = 0 and the alternative hypothesis is that one or both
of the regression coefﬁcients has changed HA : δ0 = 0 and/or δ1 = 0.
A regression for which H0 is not rejected is said to be stable. A standard test
statistic, assuming that, as we do here, there are sufﬁcient observations in each
regime to enable estimation, is an F test, which will be distributed as F(2, T – 4)
if we add the assumption that εt is normally distributed (so that it is niid) or T is

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 10 — #10

Some Common Themes 11

sufﬁciently large to enable the central limit theorem to deliver normality. The
F test statistic, ﬁrst in general, is:

ε̂ ε̂r − ε̂ur ε̂ur T − K
C= r ∼ F(g, T − K) (1.21)
ε̂ur ε̂ur g

Speciﬁcally in this case K = 4 and g = 2, therefore:

ε̂ ε̂r − ε̂ur ε̂ur T − 4
C= r ∼ F(2, T − 4) (1.22)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ε̂ur ε̂ur 2

where ε̂r and ε̂ur are the vectors of residuals from ﬁtting (1.19) and (1.20a),
respectively. At the moment the assumption that xt is ﬁxed in repeated samples
removes complications due to possible stochastic regressors.
The F test is one form of the Chow test (Chow, 1960) and obviously generalises
quite simply to more regressors. A large sample version of the Chow test uses the
result that if a random variable, C, has the F(g, T – K) distribution, then gC ⇒D
χ2 (g), where ⇒D indicates weak convergence or convergence in distribution
(see A Primer, chapter 4). The test is sometimes presented in Wald form as:

ε̂r ε̂r − ε̂ur ε̂ur
W=T (1.23)
ε̂ur ε̂ur

T
=g C
T−k
≈ gC
W ⇒D χ2 (g) (1.24)

So that W is asymptotically distributed as χ2 with g degrees of freedom.

The problem to be focussed on here is when, as is usually the case, the break-
point is unknown rather than known. In such a case it may be tempting and,
indeed, it may be regarded as ‘good’ econometrics to assess such a regression for
stability by running the Chow test for all possible breakpoints. In this sense the
test is being used as a regression diagnostic test. ‘All possible’ allows for some
trimming to allow enough observations to empirically identify the breakpoint,
if it is present (it is also essential for the distributional results to hold that λb is
in the interior of the set = (0, 1)). A typical trimming would be 10% or 15%
of the observations at either end of the sample.
The problem with checking for all possible breakpoints is that the Chow test
no longer has the standard F distribution. To illustrate the problem consider
the testing situation comprising (1.20b) and (1.20c), with T = 100 and define
λb = T1 /T as the break proportion, for simplicity ensuring that λb is an integer.
Then simulate the model in (1.19) R times, with εt ∼ niid(0, 1), and note that
σε2 may be set to unity without loss, each time computing C. Two variations
are considered. In the first, the simplest situation, assume that λb is fixed for

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 11 — #11

12 Unit Root Tests in Time Series Volume 2

4.5

3.5
Simulated quantile

2.5

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
F quantile

Figure 1.2 Quantile plot against F distribution: λb known

each set of simulations and, for simplicity λb = 0.5 is used for this illustration.
Figure 1.2 shows a quantile-quantile plot taking the ﬁrst quantile from F(2, T – 4)
and the matching quantile from the simulated distribution of C, based on R =
50,000 replications; as a reference point, Figure 1.2 also shows as the solid line
the 45°line resulting from pairing the quantiles of F(2, T – 4) against themselves.
The conformity of the simulated and theoretical distributions is evident from
the ﬁgure.
In the second case, λb is still taken as given for a particular set of R replications,
but λb is then varied to generate further sets of R replications; in this case λb
∈ = [0.10, 90] so that there is 10% trimming and the range of is divided into
a grid of G = 81 equally spaced points (so that each value of T = 100 is considered
as a possible break point Tb ). This enables an assessment of the variation, if any,
in the simulated distributions as a function of λb (but still taking λb as given for
each set of simulations). The 95% quantiles for each of the resulting distributions
are shown in Figure 1.3, from which it is evident that although there are some
varaiations these are practically invariant even for λb close to the beginning and
end of the sample; (the maximum number of observations in each regime is at
the centre of the sample period, so that the number of observations in one of
the regimes declines as λb → 0.1 or λb → 0.9).
As noted, the more likely case in practice is that a structural break is suspected
but the timing of it is uncertain, so that λb is unknown. The questions then
arise as to what might be taken as a single test statistic in such a situation and

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 12 — #12

Some Common Themes 13

3.2
95% quantile for F(2, 96)
3.1

3 simulated 95% quantiles as λb varies

95% quantile

2.9

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2.8

2.7

2.6
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
λb

Figure 1.3 95% quantile for Chow test: λb known

what is the distribution of the chosen test statistic. One possibility is to look for
the greatest evidence against the null hypothesis of structural stability in the
set , so that the chosen test statistic is the supremum of C, say Sup-C, over all
possible breakpoints, with the breakpoint being estimated as that value of λb ,
say λ̂ = λmax
b
, that is associated with Sup-C. This idea is associated with the work
of Andrews (1993) and Andrews and Ploberger (1994), who also suggested the
arithmetic average and the exponential average of the C values over the set .
For reference these are as follows:

Supremum : Sup-C(λ̂) = supλb ∈ C(λb ) (1.25)

Mean : C(λb )ave = G−1 C(λb ) (1.26)
λb ∈

Exponential mean : C(λb )exp = log G−1 exp[C(λb )/2] (1.27)
λb ∈

Note that Andrews presents these test statistics in their Wald forms referring
to, for example, Sup-W(.). The F form is used here as it is generally the more
familiar form in which the Chow test is used; the translation to the Wald form
was given in (1.23). Note that neither C(λb )ave nor C(λb )exp provide an estimate
of the breakpoint.
Andrews (1993) obtained the asymptotic null distributions of Supremum
versions of the Wald, LR and LM Chow tests and shows them to be standard-
ised tied-down Bessel processes. Hansen (1997b) provides a method to obtain
approximate asymptotic p-values for the Sup, Exp and Ave tests. Diebold and
Chen (1996) consider structural change in an AR(1) model and suggested using
a bootstrap procedure which allows for the dynamic nature of the regression

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 13 — #13

14 Unit Root Tests in Time Series Volume 2

0.9
F(2, 96)
0.8
Sup-C(λb)
C(λb)ave
0.7

0.6

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.5

0.4

0.3

0.2
90% quantiles shown
0.1
1.85 2.36 5.08
0
0 1 2 3 4 5 6 7 8 9

Figure 1.4 CDFs of F, Sup-C and Cave tests

model and the practicality of a ﬁnite sample, for which asymptotic critical values
are likely to be a poor guide.
Note that the distribution of Sup-C(λ̂) is not the same as the distribution of
C(λb ), where the latter is C evaluated at λb , so that quantiles of C(λb ) should
not be used for hypothesis testing where a search procedure has been used to
estimate λb . The effect of the search procedure is that the quantiles of Sup-C(λ̂)
are everywhere to the right of the quantiles of the corresponding F distribution.
On the other hand the averaging procedure in C(λb )ave tends to reduce the
quantiles that are relevant for testing.
To illustrate, the CDFs for F(2, 96), Sup-C(λ̂) and C(λb )ave are shown in Figure
1.4 for the procedure that searches over λb ∈ = [0.10, 90], with G = 81 equally
spaced points, for each simulation. For example, the 90% and 95% quantiles of
the distributions of Sup-C(λ̂) are 5.08 and 6.02, and those of C(λb )ave are 1.85
and 2.28, compared to 2.36 and 3.09 from F(2, 96).

1.5.2 Self-exciting threshold autoregression (SETAR)

A second example of a nuisance parameter that is only identiﬁed under the
alternative hypothesis arises in a threshold autoregression. This can also be inter-
preted as model of the generation of yt that is not structurally stable, but the
form of the instability depends not on a ﬁxed point in time, but on a threshold,
κ, that is usually expressed in terms of a lagged value or values of yt or the change

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 14 — #14

Some Common Themes 15

in yt . If the threshold is determined in this way such models are termed self-
exciting, as the movement from one regime to another is generated within the
model’s own dynamics. This gives rise to the acronym SETAR for self-exciting
threshold autoregression if the threshold depends on lagged yt , and to MSETAR
(sometimes shortened to MTAR) for momentum SETAR if the threshold depends
on yt .
The following illustrates a TAR of order one, TAR(1), which is ‘self-exciting’:

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
yt = ϕ0 + ϕ1 yt−1 + εt if f(yt−1 ) ≤ κ, regime 1 (1.28)
yt = φ0 + φ1 yt−1 + εt if f(yt−1 ) > κ, regime 2 (1.29)

where εt ∼ iid(0, σε2 ), thus, for simplicity, the error variance is assumed con-
stant across the two regimes; for expositional purposes we assume that f(yt−1 )
is simply yt−1 , so that the regimes are determined with reference to a single
lagged value of yt . It is also assumed that |ϕ1 | < 1 and |φ1 | < 1, so that there
are no complications arising from unit roots (the unit root case is considered in
chapter 6). Given this stability condition, the implied long-run value of yt , y∗ ,
depends upon which regime is generating the data, being y∗1 = ϕ0 /(1 − ϕ1 ) in
Regime 1 and y∗2 = φ0 /(1 − φ1 ) in Regime 2.
The problem is to assess whether the regression model has been ‘structurally
stable’, with stability deﬁned as H0 : ϕ0 = φ0 and ϕ1 = φ1 , whereas HA : ϕ0 = φ0
and/or ϕ1 = φ1 is ‘instability’ in this context. As in the Chow case, the two
regimes can be written as one generating model:

yt = φ0 + φ1 yt−1 + I(yt−1 ≤ κ)γ0 + I(yt−1 ≤ κ)γ1 yt−1 + εt (1.30)

where γ0 = ϕ0 − φ0 and γ1 = ϕ1 − φ1 . (In Chapter 6, where TAR models are

considered in greater detail, the indicator function is written more economically
as I1t .) The null and alternative hypotheses can then be formulated as H0 : γ0 =
γ1 = 0 and HA : γ0 = 0 and/or γ1 = 0, so that the overall model is analogous to
(1.20a), with the following minor qualification. Specifically, conventions differ
in specifying the indicator function; for the TAR model, Regime 2 is taken to be
the base model, see Q1.1, whereas for the Chow test, Regime 1 was taken to be
the base model. Note that the long-run value of yt in Regime 1 is obtained as
y∗1 = (φ0 + γ0 )/[1 − (φ1 + γ1 )] = ϕ0 /(1 − ϕ1 ).
The F test for H0 against HA , or its Wald, LM or LR form, parallels that for the
Chow test and, with a suitable reinterpretation, the test statistic C of (1.23) here
for known κ, provides a test statistic of the hypothesis of one regime against that
of two regimes. However, the presence of stochastic regressors in the AR/TAR
model means that C is not exactly F(g, T – K) even in this case.
The parallel between the model of instability under the Chow formulation
and the threshold specification should be clear and extends to the more likely
situation of unknown κ; in both cases, the regime separator is only identified

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 15 — #15

16 Unit Root Tests in Time Series Volume 2

under HA . The problems of estimating κ and, hence, the regime split, and testing
for the existence of one regime against two regimes, are solved together and take
the same form as the in the Chow test.
In respect of the ﬁrst problem, specify a grid of N possible values for κ, say κ(i) ∈
K and then estimate (1.30) for each κ(i) , taking as the estimate of κ the value of
κ(i) that results in a minimum for the residual sum of squares over all possible
values of κ(i) . As in the Chow test, some consideration must be given to the range
of the grid search. In a SETAR, κ is expected to be in the observed range of yt , but

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
this criterion must be coupled with the need to enable sufficient observations in
each regime to empirically identify the TAR parameters. Thus, starting from the
ordered values of yt , say y(1) < y(2) < . . . < y(T) , some observations are trimmed
from the beginning and end of the sequence {y(t) }T1 , with, typically, 10% or 15%
of the observations trimmed out of the sample to define K.
Obtaining the minimum of the residual sum of squares from the grid search
also leads to the Supremum F test: the residual sum of squares under H0 is given
for all values of κ(i) , hence the F-test will be maximised where the residual sum
of squares is minimised under HA . The resulting estimator of κ is denoted κ̂.
The average and exponential average test statistics are defined as in (1.26) and
(1.27), respectively. The test statistics are referred to as Sup-C(κ̂), C(κ(i) )ave and
C(κ(i) )exp to reflect their dependence on κ̂ and κ(i) ∈ K, respectively.
In the case of unknown κ (as well as known κ), the finite sample quantiles can
be obtained by simulation or by a natural extension to a bootstrap procedure.
To illustrate, the quantiles of the test statistics with unknown κ were simulated
as in the case for Figure 1.4, but with data generated as follows: y0 is a draw from
N(0, 1/(1 − ϕ12 )), that is a draw from the unconditional distribution of yt , which
assumes stationarity (and normality); subsequent observations were generated
by yt = ϕ1 yt−1 + εt , εt ∼ N(0, 1), with ϕ1 = 0.95 and T = 100. The Chow-type
tests allowed for a search over κ(i) ∈ K = [y(0.1T) ,y(0.9T) ], i = 1, . . . , 81. The CDFs
for F(2, 96), Sup-C(λ̂) and C(λb )ave are shown in Figure 1.5, from which it is
evident that the CDFs are very similar to those in the corresponding Chow case
(see Figure 1.4). The 90% and 95% quantiles of the simulated distributions of
Sup-C(λ̂) are 5.15 and 6.0, and those of C(λb )ave are 1.83 and 2.25, compared
to 2.36 and 3.09 from F(2, 96).

1.5.3 Bootstrap the test statistics

In Chapters 5, 6 and 8 a bootstrap procedure is used to generate the bootstrap
quantiles and the bootstrap p-value associated with a particular test statistic. The
method is illustrated here for the two-regime model where the regime separation
is unknown. A key difference in the bootstrap procedure compared to a standard
simulation, in order to obtain quantiles of the null distribution, is that the
bootstrap procedure samples with replacement from the empirical distribution
function of the residuals from estimation of the model under H0 . In a standard

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 16 — #16

Some Common Themes 17

0.9
F(2, 96)
0.8
Sup-C(λb)
C(λb)ave
0.7

0.6

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.5

0.4

0.3

0.2
90% quantiles shown
0.1
1.83 2.36 5.15
0
0 1 2 3 4 5 6 7 8 9

Figure 1.5 CDFs of F, Sup-C and Cave tests for TAR

simulation the process shocks, εt , are drawn from a speciﬁed distribution, usually
N(0, σε2 ), and normally it is possible to set σε2 = 1 without loss. Thus, if the
empirical distribution function is not close to the normal distribution function,
differences are likely to arise in the distribution of the test statistic of interest.
The bootstrap steps are as follows.

1. Estimate the initial regression assuming no structural break, yt = ϕ0 +ϕ1 yt−1 +

εt , t = 1, . . . , T, to obtain yt = ϕ̂0 + ϕ̂1 yt−1 + ε̂t , and save the residual sequence
{ε̂t }Tt=1 . Note that the inclusion of an intercept ensures that the mean of {ε̂t }Tt=1
is zero so that the residuals are centred.
2. Estimate the maintained regression over the grid of values of λb ∈ or κ(i) ∈
K, with the appropriate indicator function I(.) deﬁned either by a temporal
break or the self-exciting indicator:

yt = ϕ0 + ϕ1 yt−1 + I(.)γ0 + I(.)γ1 yt−1 + εt

Calculate the required test statistics, for example Sup-C(.), C(.)ave and C(.)exp .
3. Generate the bootstrap data as:

ybt = ϕ̂0 + ϕ̂1 ybt−1 + ε̂bt t = 1, . . . , T

The bootstrap innovations ε̂bt are drawn with replacement from the residuals
{ε̂t }Tt=1 . The initial value in the bootstrap sequence of ybt is yb0 , and there are
(at least) two possibilities for this start up observation. One is to choose yb0

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 17 — #17

18 Unit Root Tests in Time Series Volume 2

at random from {yt }Tt=1 , another, the option taken here, is to choose yb0 = y1 ,
so that the bootstrap is ‘initialised’ by an actual value.
4. Estimate the restricted (null) and unrestricted (alternative) regressions using
bootstrap data over the grid, either λb ∈ or κ(i) ∈ K:

b,r b,r b,r b,r b,r

ŷt = ϕ̂0 + ϕ̂1 yt−1 , et = ybt − ŷt

ŷbt = ϕ̂0b + ϕ̂1b yt−1 + I(.)γ̂0b + I(.)γ̂1b ybt−1 , ebt = ybt − ŷbt

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Calculate the bootstrap versions of the required test statistics, say Sup-Cb (.),
Cb (.)ave and Cb (.)exp . The test statistics may be computed in Wald or F form.
It is usually easier to calculate the test statistics in the form of the difference
in the restricted and unrestricted residual sums of squares; alternatively they
can be calculated using just the unrestricted regression, see Q1.2, in which
b,r
case there is no need to calculate ŷt .
5. Repeat steps 3 and 4 a total of B times, giving B sets of bootstrap data and B
values of the test statistics.
6. From step 5, sort each of the B test statistics and thus obtain the bootstrap
(cumulative) distribution.
7. The quantiles of the bootstrap distribution are now available from step 6;
additionally the bootstrap p-value, pbs , of a sample test statistic can be esti-
mated by ﬁnding its position in the corresponding bootstrap distribution. For
example, pbs [Sup-C(.)] = #[Sup-Cb (.) > Sup-C(.)]/B, where # is the counting
function (B + 1 is sometimes used in the denominator, but B should at least
be large enough for this not to make a material difference).

This bootstrap procedure assumes that there is not a unit root under the null
hypothesis. If this is not the case, then the bootstrap data should be generated
with a unit root (see Chapters 6 and 8 and UR, Vol. 1, chapter 8).

1.5.4 Illustration: Hog–corn price

By way of illustrating the techniques introduced here, they are applied to a time
series on the ratio of the hog to corn price, with US monthly data for the period
1910m1 to 2008m12, an overall sample of 1,188 observations (see Figure 1.6 for
the data in logarithms). The data is measured as the log ratio of the $ price of
market hogs per cwt to the price of corn in $ per bushel and denoted yt . Interest
in the hog–corn price ratio derives from its status as a long-standing proxy for
proﬁtability in the hog industry as the largest cost in producing hogs is the cost
of feed.
For simplicity of exposition, the basic model to be ﬁtted is an AR(1); (for
substantive modelling of the hog–corn price ratio, see, for example, Holt and

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 18 — #18

Some Common Themes 19

3.5

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2.5

1.5
1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010

Figure 1.6 Hog–corn price ratio (log, monthly data)

Craig, 2006). The estimated model resulted in:

ŷt = 0.101 + 0.962yt−1

ŷ∗ = 2.67

where ŷ∗ is the estimated steady state. Two variations are considered, the ﬁrst
where possible structural instability is related to an unknown breakpoint and
the other to the regime indicator yt−1 ≤ κ, where κ is unknown, so that the
alternative is a two-regime TAR.

1.5.4.i Chow test for temporal structural stability

The Chow test allows for a search with λb ∈ = [0.10, 90]; each observation
included in the set is a possible breakpoint, so that G = 952. The resulting
estimated two-regime model is:

ŷt = 0.189 + 0.923yt−1 − 0.034I(t > T1 ) + I(t > T1 )0.023yt−1

The models estimated for each regime, together with their implied steady states,
are as follows.

Regime 1

ŷt = 0.189 + 0.923yt−1 if 1 ≤ t ≤ 561

ŷ∗1 = 2.46

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 19 — #19

20 Unit Root Tests in Time Series Volume 2

Regime 2

ŷt = 0.189 − 0.034 + (0.924 + 0.023)yt−1 if 562 ≤ t ≤ 1, 187

= 0.155 + 0.947yt−1
ŷ∗2 = 2.87

The estimated breakpoint obtained at the minimum of the residual sum of

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
squares is T̂1 = 561, so that λ̂b = 0.473 and this estimate is well inside . The
Chow tests allowing for the search procedure are: Sup-C(λ̂b ) = 8.42, C(λb )ave =
4.21 and C(λb )exp = 2.69. The simulated 95% quantiles (assuming an exogenous
regressor) for T = 1,187 are 5.75, 2.22 and 1.37, respectively, which suggest
rejection of the null hypothesis of no temporal structural break. There are
two variations on obtaining appropriate critical values. First, by way of con-
tinuing the illustration, the quantiles should allow for the stochastic regressor
yt−1 , so a second simulation was undertaken using an AR(1) generating process
yt = ϕ1 yt−1 + εt , with ϕ1 = 0.95 and y0 ∼ N(0, 1/(1 − ϕ12 )), εt ∼ N(0, 1). In this
case, the 95% quantiles were 7.26, 2.49 and 1.55, each slightly larger than in
the case with a nonstochastic regressor. Lastly, the quantiles and p-values were
obtained by bootstrapping the test statistic to allow for the possibility that the
empirical distribution function is generated from a dependent or non-normal
process. The bootstrap 95% quantiles, with B = 1,000, for Sup-C(λ̂b ), C(λb )ave
and C(λb )exp were 6.08, 2.17 and 1.36, with bootstrap p-values, pbs , of 1%, 0.5%
and 1%, respectively.

1.5.4.ii Threshold autoregression

In the second part of the illustration a two-regime TAR was ﬁtted with K =
[y(0.05T) ,y(0.95T) ], so that the search allows for κ to lie between the 5% and 95%
percentiles and N = 1,100 (an initial search between the 10% and 90% per-
centiles suggested that a slight extension to the limits was needed as the results
below indicate). Minimising the residual sum of squares over this set resulted in
κ̂ = 2.21, with the following estimates:

Two-regime TAR

ŷt = 0.080 + 0.970yt−1 + 0.772I(yt−1 ≤ κ̂) − 0.363I(yt−1 ≤ κ̂)yt−1

Regime 1: yt−1 ≤ κ̂ = 2.21, T1 = 61

ŷt = 0.080 + 0.772 + (0.970 − 0.363)yt−1

= 0.852 + 0.607yt−1
ŷ∗1 = 2.16

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 20 — #20

Some Common Themes 21

Regime 2: yt−1 > κ̂ = 2.21, T2 = 1, 127

ŷt = 0.080 + 0.970yt−1
ŷ∗2 = 2.64
The bulk of the observations are in Regime 2, which exhibits much more per-
sistence than Regime 1. The estimates of the coefﬁcient on yt−1 from the AR(1)
model and in Regime 2 are both quite close to the unit root, so that would be

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
an issue to investigate in a more substantive analysis. (Fitting an AR(2) model,
though preferably to try and pick up some of the cyclical behaviour, resulted in
very similar estimates of κ̂ and ŷ∗ , and does not add to the illustration.)
The Chow-type tests allowing for a search over κ(i) ∈ K were: Sup-C(κ̂) = 7.80,
C(κ(i) )ave = 3.01 and C(κ(i) )exp = 2.07. Two variations were considered to obtain
the quantiles of the null distribution. As in the case of the Chow test for temporal
stability, the generating process was taken to be AR(1), yt = ϕ1 yt−1 + εt , with
ϕ1 = 0.95 and y0 ∼ N(0, 1/(1 − ϕ12 )), εt ∼ N(0, 1). In this case, the 95% quantiles
for Sup-C(κ̂), C(κ(i) )ave and C(κ(i) )exp were 6.37, 2.2 and 1.35, so H0 is rejected.
The choice of ϕ1 = 0.95 was motivated by the LS estimate of ϕ1 , which is fairly
close to the unit root; but having started in this way a natural extension is
to bootstrap the quantiles and p-values. The bootstrap 95% quantiles for Sup-
C(λ̂b ), C(λb )ave and C(λb )exp were 6.82, 2.3 and 1.41, with bootstrap p-values,
pbs , of 2.2%, 0.2% and 0.6%, respectively.

1.6 The way ahead

A chapter summary was provided in the preface, so the intention here is not
to repeat that but to focus briefly on the implications of model choice for fore-
casting, which implicitly encapsulates inherent differences, such as the degree
of persistence, between different classes of models. Stock and Watson (1999)
compared forecasts from linear and nonlinear models for 215 monthly macroe-
conomic time series for horizons of one, six and twelve months. The models
considered included AR models in levels and first differences and LSTAR (logis-
tic smooth transition autoregressive) models (see Chapter 5). They used a form
of the ERS unit root test as a pretest (see Elliott, Rothenberg and Stock, 1996,
and UR, Vol. 1, chapter 7). One of their conclusions was that “forecasts at all
horizons are improved by unit root pretests. Severe forecast errors are made in
nonlinear models in levels and linear models with time trends, and these errors
are reduced substantially by choosing a differences or levels specification based
on a preliminary test for a unit root” (Stock and Watson, 1999, p. 30). On the
issue of the importance of stationarity and nonstationarity to forecasting see
Hendry and Clements (1998, 1999).
The study by Stock and Watson was extensive, with consideration of many
practical issues such as lag length selection, choice of deterministic components

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 21 — #21

22 Unit Root Tests in Time Series Volume 2

and forecast combination, but necessarily it had to impose some limitations. The
class of forecasting models could be extended further in several directions. A unit
root pretest suggests the simple dichotomy of modelling in levels vs differences,
whereas a fractional unit root approach, as in Chapters 3 and 4, considers a con-
tinuum of possibilities; and Smith and Yadav (1994) show there are forecasting
costs to ﬁrst differencing if the series is generated by a fractionally integrated
process. Second, even if taking account of a unit root seems appropriate there
remains the question of whether the data should ﬁrst be transformed, for exam-

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ple in logarithms, so that another possibility is a pretest that includes the choice
of transformation, which itself could affect the outcome of a unit root test, (see
Chapter 2).
The LSTAR models considered by Stock and Watson were one in a class of
nonlinear models that allow smooth transition (ST) between AR regimes, others
include: the ESTAR (exponential STAR) and BRW (bounded random walk), con-
sidered in Chapter 5; piecewise linear models either in the form of threshold
models, usually autoregressive, TAR, as in Chapter 6; or temporally contiguous
models that allow for structural breaks ordered by time, with known breakpoint
or unknown breakpoint, as considered in Chapters 7 and 8, respectively.

Questions

Q1.1. Rewrite the TAR(1) model of Equation (1.30) so that the base model is
Regime 1:

yt = ϕ0 + ϕ1 yt−1 − I(yt−1 > κ)γ0 − I(yt−1 > κ)γ1 yt−1 + εt (1.31)

A1.1. Note that indicator function is written so that it takes the value 1 if yt−1 >
κ, whereas in Equation (1.30) it was written as I(yt−1 ≤ κ). Recall that γ0 = ϕ0 −φ0
and γ1 = ϕ1 − φ1 , hence making the substitutions then, as before:
Regime 1

yt = ϕ0 + ϕ1 yt−1 + εt

Regime 2

yt = ϕ0 + ϕ1 yt−1 + (φ0 − ϕ0 ) + (φ1 − ϕ1 )yt−1 + εt

= φ0 + φ1 yt−1 + εt

This is exactly as in (1.28) and (1.29), respectively.

Q1.2. The Wald test is often written as follows:

W = T(Rb) [σ̂ε2 R (Z Z)R]−1 Rb

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 22 — #22

Some Common Themes 23

Using the TAR(1) as an example, show that:

ε̂ ε̂r − ε̂ur ε̂ur
W=T r
ε̂ur ε̂ur

T
=g C
T−k
≈ gC

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
that is, approximately in ﬁnite samples W is g times the corresponding F-version
of the test.
A1.2. First rewrite the TAR(1) in an appropriate matrix-vector form:

yt = α0 + α1 xt + I(t > T1 )δ0 + I(t > T1 )δ1 xt + εt

y = Zβ + ε

where:

zt = (1, xt , I(t > T1 ), I(t > T1 )xt )

β = (α0 , α1 , δ0 , δ1 )
σ̂ε2 = ε̂ur ε̂ur /(T − k)
σ̃ε2 = ε̂ur ε̂ur /T
ε̂ur = y − Zb, where b is the (unrestricted) LS estimator of β.

The restricted model imposes (δ0 , δ1 ) = (0, 0). The restrictions can be expressed
in the form Rβ = s, where
 
β1
0 0 1 0  β2  
 0
 =
0 0 0 1  β3  0
β4
Although this could be reduced to

1 0 β3 0
=
0 1 β4 0

it is convenient to use the ‘padded’ version to maintain the dimension of R as

conformal with β.
The LS estimator of β imposing Rβ = s is denoted br and the resulting residual
sum of squares is ε̂r ε̂r , where ε̂r = y − Zbr . Note that Rbr = 0 or Rbr = s in general.
Next consider estimation of the regression model subject to the restrictions
Rβ = s (the general case is considered as the application here is just a special
case). A natural way to proceed is to form the following Lagrangean function:

L = (y − Zβ) (y − Zβ) − 2λ (Rβ − s)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 23 — #23

24 Unit Root Tests in Time Series Volume 2

Where λ is a g x 1 vector of unknown Lagrangean multipliers. Minimising L

with respect to β and λ results in:

∂L/ = −2Z y + 2Z Zβ − 2R λ = 0
∂β
∂L/ = −2(Rβ − s) = 0
∂λ

Let br and λr denote the solutions to these ﬁrst order conditions, then:

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
br = (Z Z)−1 Z y + (Z Z)−1 R λr
= b + (Z Z)−1 R λr

λr is obtained on noting that br must satisfy Rbr = s:

Rbr = Rb + R(Z Z)−1 R λr ⇒

λr = −[R(Z Z)−1 R ]−1 (Rb − s)

Now substitute λr into br , to obtain:

br = b − (Z Z)−1 R [R(Z Z)−1 R ]−1 (Rb − s)

The restricted residual vector and restricted residual sum of squares are then
obtained as follows:

y − Zbr = y − Zb + Z(Z Z)−1 R [R(Z Z)−1 R ]−1 (Rb − s)

= ε̂ur + A, A = Z(Z Z)−1 R [R(Z Z)−1 R ]−1 (Rb − s)
ε̂r ε̂r = (y − Zbr ) (y − Zbr )
= ε̂ur ε̂ur + A A + 2A ε̂ur
= ε̂ur ε̂ur + A A as A ε̂ur = 0 because Z ε̂ur = 0
⇒ ε̂r ε̂r − ε̂ur ε̂ur = A A

Considering A A further note that:

A A = [Z(Z Z)−1 R [R(Z Z)−1 R ]−1 (Rb − s)] [Z(Z Z)−1 R [R(Z Z)−1 R ]−1 (Rb − s)]
= (Rb − s) [R(Z Z)−1 R ]−1 R(Z Z)−1 Z Z(Z Z)−1 R [R(Z Z)−1 R ]−1 (Rb − s)
= (Rb − s) [R(Z Z)−1 R ]−1 R(Z Z)−1 R [R(Z Z)−1 R ]−1 (Rb − s)
= (Rb − s) [R(Z Z)−1 R ]−1 (Rb − s)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 24 — #24

Some Common Themes 25

Hence:

W = (Rb − s) [σ̃ε,ur2
R(Z Z)−1 R ]−1 (Rb − s)

ε̂ ε̂r − ε̂ur ε̂ur
=T r
ε̂ur ε̂ur

T ε̂r ε̂r − ε̂ur ε̂ur (T − k)
=g
(T − k) ε̂ur ε̂ur g

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
T ε̂ ε̂r − ε̂ur ε̂ur (T − k)
=g C, C= r
(T − k) ε̂ur ε̂ur g

≈ gC

as required in the question.

Q1.3.i. Deﬁne the partial sum of yt as St = ti=1 yi , and assume for simplicity
that E(yt ) = 0, then the long run variance of yt is equivalently written as follows:
2
σy,lr = limT→∞ T−1 E(S2T )

Assuming that yt is generated by a covariance stationary process, show that by

multiplying out the terms in ST , σy,lr
2 may be expressed in terms of the variance,

γ(0), and autocovariances, γ(k), as follows:

∞
2
σy,lr = γ(k)
k=−∞
∞
= γ(0) + 2 γ(k)
k=1
∞
= γ(k) assuming stationarity so that γ(−k) = γ(k)
k=−∞
∞ −2 2
Q1.3.ii. Conﬁrm that if yt = ρyt−1 +εt , |ρ| < 1, then σlr,y
2 =
j=−∞ γj = (1−ρ) σε ;
show that this is the same as ω(1)2 σε2 .
Q1.3.iii. Show that the spectral density function of yt at frequency zero is
proportional to σy,lr
2 .

A1.3.i. Write ST = i y, where i = (1, . . . , 1) and y = (y1 , . . . , yT ) and, therefore:

E(S2T ) = E(i y)(i y) because i y is a scalar

= E[i y(y i)] iy=yi
= i E(yy )i
  
y21 y1 y2 ... y1 yT 1
 y22 ...  
 y2 y1 y1 yT  1 
= (1, . . . , 1)E 
 .. .. .. .. 
 .. 

 . . . .  . 
yT y1 yT y2 ... y2T 1

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 25 — #25

26 Unit Root Tests in Time Series Volume 2

E(S2T ) sums all the elements in the covariance matrix of y, of which there are T
of the form E(y2t ), 2(T – 1) of the form E(yt yt−1 ), 2(T – 2) of the form E(yt yt−2 )
and so on until 2E(y1 yT ), which is the last in the sequence. If the {y2t } sequence
is covariance stationary, then E(y2t ) = γ0 and E(yt yt−k ) = γk , hence:

E(S2T ) = Tγ0 + 2(T − 1)γ1 + 2(T − 2)γ2 + · · · + 2γT−1

T−1

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
= Tγ0 + 2 (T − j)γj
j=1
T−1
T−1 E(S2T ) = γ0 + 2T−1 (T − j)γj
j=1
T−1
= γ0 + 2 (1 − j T)γj
j=1

Taking the limit as T → ∞, we obtain:

2
σy,lr ≡ limT→∞ T−1 E(S2T )
∞
= γ0 + 2 γj
j=1

In taking the limit it is legitimate to take j as ﬁxed and let the ratio j/T tend to
∞
zero. Covariance stationarity implies γk = γ−k , hence γ0 + 2 ∞ j=1 γj = j=−∞ γj
and, therefore:

∞
2
σy,lr = γj
j=−∞

A1.3.ii. First back substitute to express yt in terms of current and lagged εt and
then obtain the variance and covariances:

∞
yt = ρi εt−i
i=0

γ0 ≡ var(yt )
∞
= ρ2i E(ε2t−i ) using cov(εt εs ) = 0 for t = s
i=0

= (1 − ρ2 )−1 σε2 using E(ε2t−i ) = σε2

γk = ργk−1 k = 1, . . .

= ρk (1 − ρ2 )−1 σε2

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 26 — #26

Some Common Themes 27

The long-run variance is then obtained:

∞
2
σy,lr = γ
k=−∞ k
∞
= γ0 + 2 γ
k=1 k

= (1 − ρ2 )−1 σε2 1 + 2 (1 − ρ)−1 − 1

1 (1 + ρ) 2
= σ
(1 − ρ)(1 + ρ) (1 − ρ) ε

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
= (1 − ρ)−2 σε2
Next note that yt = ρyt−1 + εt may be written as yt = w(L)εt where w(L) =
(1 − ρL)−1 , hence:
var[w(1)εt ] = w(1)2 σε2
= (1 − ρ)−2 σε2 as w(1)2 = (1 − ρ)−2
2
= σy,lr
A1.3.iii. The sdf for a (covariance stationary) process generating data as yt =
w(L)εt is:
fy (λj ) = |w(e−iλj )|2 fε (λj ) λj ∈ [−π, π]

where w(e−iλj ) is w(.) evaluated at e−iλj , and fε (λj ) = (2π)−1 σε2 , for all λj , is the
spectral density function of the white noise input εt . Thus, the sdf of yt is the
power transfer function of the ﬁlter, |w(e−iλj )|2 , multiplied by the sdf of white
noise. If λj = 0, then:
fy (0) = |w(e0 )|2 fε (0)
= (2π)−1 w(1)2 σε2
= (2π)−1 σy,lr
2

Hence, fy (0) is proportional to σy,lr

2 .

The sdf may also be deﬁned directly in terms of either the Fourier transform
of the autocorrelation function, as in Fuller (1996, chapter 3), or of the auto-
covariance function, as in Brockwell and Davis (2006, chapter 4), the resulting
deﬁnition differing only by a term in γ(0). We shall take the latter representation
so that
k=∞
fy (λj ) = (2π)−1 γ(k)e−ikλj
k=−∞
Hence if λj = 0, then:
k=∞
fy (0) = (2π)−1 γ(k)
k=−∞

= (2π)−1 σy,lr
2

as before using the result in A1.3.i.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_02_CHA01” — 2012/5/21 — 17:04 — PAGE 27 — #27

2
Functional Form and Nonparametric
Tests for a Unit Root

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Introduction

The chapter starts with an issue that is often not explicitly addressed in empir-
ical work. It is usually not known whether a variable of interest, yt , should be
modelled as it is observed, referred to as the levels (or ‘raw’ data), or as some
transformation of yt , for example by taking the logarithm or reciprocal of the
variable. Regression-based tests are sensitive to this distinction, for example
modelling in levels when the random walk is in the logarithms, implying an
exponential random walk, will affect the size and power of the test.
The second topic of this chapter is to consider nonparametric or semi-
parametric tests for a unit root, which are in contrast to the parametric tests
that have been considered so far. The DF tests, and their variants, are parametric
tests in the sense that they are concerned with direct estimation of a regression
model, with the test statistic based on the coefﬁcient on the lagged dependent
variable. In the DF framework, or its variants (for example the Shin and Fuller,
1998, ML approach), the parametric structure is that of an AR or ARMA model.
Nonparametric tests use less structure in that no explicit parametric framework
is required and inference is based on other information in the data, such as
ranks, signs and runs. Semi-parametric tests use some structure, but it falls short
of a complete parametric setting; an example here is the rank score-based test
outlined in Section 2.3, which is based on ranks, but requires an estimator of
the long-run variance to neutralise it against non-iid errors.
This chapter progresses as follows. The question of what happens when a
DF-type test is applied to the wrong (monotonic) transformation is elaborated
in Section 2.1. If the unit root is in the log of yt , but the test is formulated
in terms of yt , there is considerable ‘over-rejection’ of the null hypothesis, for
example, an 80% rejection rate for a test at a nominal size of 5%, which might
be regarded as incorrect. However, an alternative view is that this is a correct
decision, and indeed 100% rejection would be desirable as the unit root is not

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 28 — #1

Functional Form and Nonparametric Tests 29

present in yt (in the terminology of Section 2.2, the generating process is a log-
linear integrated process not a linear integrated process). There are two possible
responses to the question of whether if a unit root is present it is in the levels or
the logs. A parametric test, due to Kobayashi and McAleer (1999) is outlined in
Section 2.2; this procedure involves two tests reﬂecting the non-nested nature
of the alternative speciﬁcations. This combines a test in which the linear model
is estimated and tested for departures in the direction of log-linearity and then
the roles are reversed, resulting in two test statistics. An alternative approach

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
is to use a test for a unit root that is invariant to a monotonic transformation,
such as the log transformation.
A test based on the ranks is necessarily invariant to monotonic transforma-
tions. Hence, a more general question can be posed: is there a unit root in the
process generating yt or some monotonic transformation of yt ? Thus, getting
the transformation ‘wrong’, for example using levels when the data should be in
logs, will not affect the outcome of such a test. Equally you don’t get something
for nothing, so the information from such a test, for example that the unit root
is not rejected, does not tell you which transformation is appropriate, just that
there is one that will result in H0 not being rejected.
Some unit root tests based on ranks are considered in Section 2.3. An obvious
starting point is the DF test modiﬁed to apply to the ranks of the series. The
testing framework is to ﬁrst rank the observations in the sample, but then use
an AR/ADF model on the ranked observations. It turns out that this simple
extension is quite hard to beat in terms of size retention and power. Another
possibility is to take the parametric score or LM unit root test (see Schmidt and
Phillips, 1992), and apply it to the ranks. This test deals with the problem of
weakly dependent errors in the manner of the PP tests by employing a semi-
parametric estimator of the long-run variance, although an AR-based estimator
could also be used.
The rank ADF test and the rank-score test take parametric tests and modify
them for use with the ranked observations. A different approach is to construct a
nonparametric test without reference to an existing parametric test. One exam-
ple of this approach, outlined in Section 2.4, is the range unit root test, which is
based on assessing whether the marginal sample observation represents a new
record in the sense of being larger than the existing maximum or smaller than
the existing minimum, the argument being that stationary and nonstationary
series add new records at different rates, which will serve to distinguish the two
processes.
Finally, Section 2.5 considers a test based on the variance-ratio principle. In
this case, the variance of yt generated by a nonstationary process with a unit
root grows at a different rate compared to when yt is generated by a stationary
process, and a test can be designed around this feature. The test described, due
to Breitung (2002), which is particularly simple to construct, has the advantage

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 29 — #2

30 Unit Root Tests in Time Series Volume 2

that it does not require speciﬁcation of the short-run dynamics, and in that
sense it is ‘model free’.
Some simulation results and illustrations of the tests in use are presented in
Section 2.6. The emphasis in the simulation results is on size retention and
relative power of the different tests; however, the advantages of nonparametric
and semi-parametric tests are potentially much wider. In different degrees they
are derived under weaker distributional assumptions than parametric tests and
are invariant to some forms of structural breaks and robust to misspeciﬁcation

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
of generally unknown short-run dynamics.

2.1 Functional form: linear versus logs

Consider a simpliﬁed version of a question that often arises in empirical work.

The data at hand are clearly trended in some form, either a stochastic trend
or a deterministic trend, and a preliminary analysis concerns distinguishing
between these two alternative explanations. However, it is not clear whether
the (likely) trend is better modelled in the levels or logs of the series. Whilst
there may be a presumption that some series are better modelled (in the sense
of being motivated by economic theory) in logs, for example real or volume
measures, because they are positive, and ﬁrst differences then relate to growth
rates, and some in levels, for example rate or share variables, or variables that
are not inherently positive, there is naturally a lack of certainty as to the precise
nature of the data generation process (DGP) – and intuition is not necessarily a
good guide as to how to proceed.
Granger and Hallman (1991), hereafter GH, showed, by means of some Monte
Carlo experiments, that the standard DF tests over-reject in the case that the
random walk is generated in the logs, but when the test is applied to the levels of
the data series. Kramer and Davies (2002), hereafter KD, showed, by theoretical
means, that the GH result is not general, and the rejection probability includes
both cases of over-rejection and under-rejection. We consider some of these
issues in this section.

2.1.1 A random walk and an exponential random walk

To indicate the importance of the basic step of data transformation in the mod-
elling procedure, assume that wt = f(yt ) is a transformation of the original data,
such that wt is generated by a ﬁrst order autoregressive process, with a unit root
under the null hypothesis. That is, respectively with and without drift:

wt = wt−1 + εt (2.1)
wt = β1 + wt−1 + εt (2.2)

where εt ∼ iid(0, σε2 ), σε2 < ∞. The two cases considered are the no transformation
case, wt = yt , and the log transformation case, wt = ln yt . In the ﬁrst case, yt

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 30 — #3

Functional Form and Nonparametric Tests 31

follows a random walk (RW) in the natural units of yt , and in the second case yt
follows an exponential random walk (ERW), which is a log-random walk; thus,
in the second case, the generating process is, ﬁrst without drift:

ln yt = ln yt−1 + εt (2.3)

In levels, this is:

yt = yt−1 eεt ⇒ (2.4a)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(ε1 +···+εt )
yt = y 0 e y0 > 0 (2.4b)

The generating process with drift is:

ln yt = β1 + ln yt−1 + εt (2.5)

In levels, this is:

yt = yt−1 e(εt +β1 )

⇒
y1 = y0 e(ε1 +β1 )
y1 = y1 e(ε2 +β1 )
= y0 e(ε1 +ε2 ) e(β1 +β1 )
.. .. ..
. . .
yt = y0 e(ε1 +···+εt ) eβ1 t (2.6a)
⇒
t
ln yt = ln y0 + β1 t + εi (2.6b)
i=1

Evidently for (2.6b) to be valid it is necessary that y0 > 0 otherwise ln(y0 ) is an

invalid operation; however note that (2.6a) is still valid for y0 < 0.
There are two situations to be considered as follows: Case 1, the random
walk is in ‘natural’ units, wt = yt , but the testing takes place using ln yt , so that
the maintained regression is of the form ln yt = μ∗t + ρ ln yt−1 + υt ; whereas in
Case 2, there is an exponential random walk, so the appropriate transformation
is wt = ln(yt ), but the testing takes place using yt , and the maintained regression
is of the form yt = μ∗t + ρyt−1 + υt . For Case 1, in practice and in simulations, yt
has to be positive otherwise ln yt cannot be computed; typical examples include
macroeconomic time series such as aggregate consumption, which naturally
have large positive magnitudes.
KD (op. cit.) consider the implications of misspeciﬁcation for the DF test
statistic δ̂ = T(ρ̂ − 1) when the maintained regression has neither a constant
nor a trend. (ρ̂ is the LS estimator of the coefﬁcient on the lagged dependent

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 31 — #4

32 Unit Root Tests in Time Series Volume 2

variable in the misspeciﬁed maintained regression.) To interpret their results, the

following terminology is used: let δ̂α be the α% critical value for the test statistic
δ̂, then under-rejection occurs if P(δ̂ < δ̂α |H0 ) < α, whereas over-rejection occurs
if P(δ̂ < δ̂α |H0 ) > α. In the former case if, for example, α = 0.05, then the true
rejection probability is less than 5%, but in the latter case the true rejection
probability exceeds 5%.
In Case 1, the random walk is in the levels, so that the unit root generating
process applies to yt ; however, the test is applied to the log of yt . Assuming that

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
yt > 0 for all t in the sample, then KD (op. cit., Theorem 2) show that both
as σε2 → 0 and as σε2 → ∞, then ρ̂ ⇒ 1, implying P(δ̂ < δα |H0 ) → 0; that is, the
rejection probability, conditional on yt > 0, tends to zero, which is the limit to
under-rejection. The simulation results reported below conﬁrm this result for δ̂,
but indicate that it does not apply generally when there is a constant and/or a
trend in the maintained regression.
In Case 2, that is the random walk is in logs, wt = ln yt , but the test is applied to
the levels, yt , the rejection probabilities increase with σε2 and T; under-rejection
occurs only for small sample sizes or for very small values of the innovation
variance, for example σε2 = 0.01 (see KD, op. cit., theorem 1). Thus, generally,
the problem is of over-rejection. These results are conﬁrmed below and do carry
across to the maintained regression with either a constant and/or a trend and
to the pseudo-t tests.

2.1.2 Simulation results

In the simulations for Case 1, yt has to be positive otherwise wt = ln yt cannot
be computed, whereas in Case 2, given y0 > 0, all yt values are necessarily
positive. There is more than one way to ensure positive yt in Case 1, and the
idea used here is to start the random walk from an initial value sufficiently
large to ensure that the condition is met over the sample; of course, apart from
the unit root, this imparts a deliberate nonstationarity into the series; see also
GH (op. cit.). The notion of ‘sufficiently large’ is first calibrated by reference to
a number of macroeconomic time series; for example, based on UK real GDP
which, scaled to have a unit variance, has a starting value of 50, and to illustrate
sensitivity, the variance is halved to 0.5.

Case 1
KD (op. cit.) reported that the rejection probabilities for δ̂ in Case 1 were zero
for all design parameters tried (for example, variations in σε2 , T and starting
values, and introducing drift in the DGP), and this result was conﬁrmed in
the simulations reported in Table 2.1, which are based on 50,000 replications.
However, some qualiﬁcations are required that limit the generality of this result.
First, τ̂ was under-sized, but at around 2.6%–2.8% rather than 0%, for a nominal
5% test. The biggest difference arises when the maintained regression includes

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 32 — #5

Functional Form and Nonparametric Tests 33

Table 2.1 Levels or logs: incorrect speciﬁcation of the maintained regression, impact on
rejection rates of DF test statistics (at a 5% nominal level)

Case 1 unit root in yt

ln(yt ) δ̂ δ̂μ δ̂β τ̂ τ̂μ τ̂β

T = 100
σε2 = 0.5 0.0% 4.8% 4.5% 2.7% 5.2% 4.2%
σε2 = 1.0 0.0% 5.2% 4.8% 2.6% 5.3% 4.7%

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
T = 200
σε2 = 0.5 0.0% 5.7% 5.6% 2.8% 5.3% 4.6%
σε2 = 1.0 0.0% 5.4% 5.2% 2.6% 5.1% 4.3%

Case 2 unit root in ln yt

yt δ̂ δ̂μ δ̂β τ̂ τ̂μ τ̂β
T = 100
σε2 = 0.5 72.6% 60.0% 46.4% 73.8% 49.2% 41.5%
σε2 = 1.0 94.3% 92.5% 87.7% 93.1% 89.2% 84.1%
T = 200
σε2 = 0.5 93.5% 90.3% 82.6% 91.5% 83.9% 74.5%
σε2 = 1.0 96.2% 95.9% 95.8% 95.4% 94.8% 94.4%

Notes: Case 1 is where the random walk is in ‘natural’ units, wt = yt , but the testing takes place using
ln yt ; Case 2 is where there is an exponential random walk, so the correct transformation is wt = ln yt , but
the testing takes place using yt . The test statistics are of the general form δ̂ = T(ρ̂ − 1) and τ̂ = (ρ̂ − 1)/σ̂(ρ̂),
where σ̂(ρ̂) is the estimated standard error of ρ̂; a subscript indicates the deterministic terms included in
the maintained regression, with μ for a constant and β for a constant and linear trend. The maintained
regressions are deliberately misspecified; for example in Case 1, the maintained regression is specified
in terms of ln yt and ρ̂ is the estimated coefficient on ln yt−1 and in Case 2 the maintained regression is
specified in terms of yt and ρ̂ is the estimated coefficient on yt−1 .

a constant and/or a trend. In these cases, both the n-bias (δ̂) and pseudo-t tests
(τ̂) maintain their size.

Case 2
The results in KD (op. cit.) are conﬁrmed for δ̂ in Case 2. Further, the over-
rejection is found to be more general, applying to τ̂, and to the δ̂-type and
τ̂-type tests when the maintained regression includes a constant and/or a trend;
however, the over-rejection tends to decline as more (superﬂuous) deterministic
terms are included in the maintained regression. For example, the rejection rates
for σε2 = 1.0 and T = 100 are as follows: τ̂, 93.1%; τ̂μ , 89.2%; and τ̂β , 84.1%. Also
the rejection rates tend to increase as σε2 increases.
The over-rejection in Case 2 is sometimes looked on as an incorrect decision –
there is a unit root so it might be thought that the test should not reject; how-
ever, it is the correct decision in the sense that the unit root is not in the levels

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 33 — #6

34 Unit Root Tests in Time Series Volume 2

of the series, it is in the logs. Failure to reject could be more worrying as mod-
elling might then proceed incorrectly in the levels, whereas it should be in the
logarithms.

2.2 A parametric test to discriminate between levels and logs

Kobayashi and McAleer (1999a), hereafter KM, have suggested a test to discrim-

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
inate between an integrated process deﬁned alternately on the levels or the
logarithms of the data. In each testing situation, the asymptotic distribution
of the test statistic depends on whether there is drift in the unit root process.
Such tests are likely to be of particular use when a standard unit root test, such
as one of the DF tests, indicates that the series both in the levels and in the
logarithms is I(1). In this case, the tests are not sufﬁciently discriminating and
the appropriate KM test is likely to be useful.

2.2.1 Linear and log-linear integrated processes

As in Section 2.1, the data are assumed either to be generated by a unit root
process in the levels of the data, yt , or in the logarithms of the levels so that
the levels are generated by an exponential random walk or the generalisation
of that model that follows when the errors are generated by an AR process. In
either case, the data are generated by an integrated process, referred to as an IP.
The test statistics are motivated as diagnostic tests; for example, if the wrong
model is ﬁtted then heteroscedasticity is induced in the conditional variance of
the dependent variable, whereas if the correct model is ﬁtted the conditional
variance is homoscedastic.
The data-generating framework, covering both the linear and log-linear cases
and which is more general than a simple random walk is:

wt = β1 + wt−1 + zt (2.7a)
p
ψ(L)zt = εt ψ(L) = 1 − ψi Li (2.7b)
i=1

⇒
p
1− ψi Li (wt − β1 ) = εt (2.8)
i=1

If the data are generated by a linear integrated process, LIP, with drift, then
wt = yt ; whereas if the data are generated by a log-linear integrated process,
LLIP, with drift, then wt = ln(yt ). In both cases KM assume that εt ∼ iid(0, σε2 ),
E(ε3t ) = 0, E(ε4t ) < ∞, and the roots of ψ(L) are outside the unit circle (see UR Vol.1,
Appendix 2). The drift parameter β1 is assumed positive, otherwise negative
values of yt are possible, thus precluding a log-linear transformation. εt and the

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 34 — #7

Functional Form and Nonparametric Tests 35

LS residuals are given by:

p
εt = wt − β∗1 − ψi wt−i where β∗ = β ψ(1) (2.9a)
i=1
p
ε̂t = wt − β̂∗1 − ψ̂i wt−i LS residuals (2.9b)
i=1

LS estimators of β1 and σε2 are given by:

β̂∗1
β̂1 = (2.9c)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ψ̂(1)
1 T
σ̂ε2 = ε̂2 (2.9d)
T − (p + 1) t=p+1 t

In the case of no drift, then β1 = 0 in (2.7a), and an additional assumption

is required to prevent negative values of yt , which would preclude a log-
transformation. (The assumption, applicable to both the LIP and LLIP cases,
is that the limit of T1+ζ σT2 as T → ∞ is a constant for some ζ > 0, where σT2 is the
variance of εT (see KM, op. cit, for more details).
The relevant quantities for constructing the test statistics, when β1 = 0, are:
p
ε̂t = wt − ψ̂i wt−i LS residuals (2.10a)
i=1
1 T
σ̂ε2 = ε̂2 LS variance (2.10b)
T − (p + 1) t=p+1 t

Note that when testing the LIP against the LLIP, the estimator and residuals are
deﬁned in terms of wt = yt ; whereas when testing the LLIP against the LIP, the
estimator and residuals are deﬁned in terms of wt = ln yt . The following test
statistics, derived by KM (op. cit.), make the appropriate substitutions.

2.2.2 Test statistics

2.2.2.i The KM test statistics
The test statistics are:

1.i V1 : LIP with drift is the null model: test for departure in favour of the LLIP;
1.ii U1 : LIP without drift is the null model: test for departure in favour of the
LLIP;
2.i V2 : LLIP with drift is the null model: test for departure in favour of the LIP;
2.ii U2 : LLIP without drift is the null model: test for departure in favour of the
LIP.

1.i The ﬁrst case to consider is when the null model is the LIP with drift, then:
T
T−3/2 yt−1 (ε2t − σε2 ) ⇒D N(0, ω2 ) (2.11)
t=p+1

β21 4
ω2 = σ
6 ε

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 35 — #8

36 Unit Root Tests in Time Series Volume 2

⇒ Test statistic, V1 :

−3/2 1 T 2 2
V1 ≡ T yt−1 (ε̂t − σ̂ε ) ⇒D N(0, 1) (2.12)
ω̂ t=p+1
1/2
β̂21 4
ω̂ = σ̂
6 ε

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Comment: when there is drift, then with an appropriate standardisation, the
test statistic is asymptotically normally distributed with zero mean and unit
variance, and the sample value is compared to a selected upper quantile of N(0,
1), for example the 95% quantile; large values lead to rejection of the null of a
LIP with drift.
1.ii The next case is when the null model is the LIP without drift, then:

T 1 1 1
T−1 yt−1 (ε2t − σε2 ) ⇒D ω B1 (r)dB2 (r) − B1 (r)dr dB2 (r)
t=p+1 0 0 0

√ σε3
ω= 2 (2.13)
ψ(1)

⇒ Test statistic, U1 :

1 1 1 1
U1 = T−1 yt−1 (ε̂2t − σ̂ε2 ) ⇒D B1 (r)dB2 (r) − B1 (r)dr dB2 (r)
ω̂ 0 0 0

√ σ̂ε3
ω̂ = 2 (2.14)
ψ̂(1)

where B1 (r) and B2 (r) are two standard Brownian motion processes with zero
covariance.
Comment: when drift is absent, the test statistic has a non-standard distri-
bution; the null hypothesis (of a LIP) is rejected when the sample value of U1
exceeds the selected (upper) quantile; some quantiles are provided in Table 2.2
below.
2.i. The second case to consider is when the null model is the LLIP with drift,
then:

T
T−3/2 − ln yt−1 (ε2t − σε2 ) ⇒D N(0, ω2 ) (2.15)
t=p+1

2 β21 4
ω = σ
6 ε

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 36 — #9

Functional Form and Nonparametric Tests 37

⇒ Test statistic, V2 :

1 T
V2 ≡ T−3/2 − ln yt−1 (ε̂2t − σ̂ε2 ) ⇒D N(0, 1) (2.16)
ω̂ t=p+1
1/2
β̂21 4
ω̂ = σ̂
6 ε

Comment: as in the LIP case, the test statistic is asymptotically distributed

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
as standard normal, and the sample value is compared with a selected upper
quantile of N(0, 1); large values lead to rejection of the null of a LLIP with drift.
2.ii. The next case is when the null model is the LLIP without drift:
T 1 1 1
T−1 − ln yt−1 (ε2t − σε2 ) ⇒D − B1 (r)dB2 (r) − B1 (r)dr dB2 (r)
t=p+1 0 0 0

√ σϕ3
= 2 (2.17)
ψ(1)

⇒ Test statistic, U2 :
1 1 1
1
U2 = T−1 yt−1 (ε̂2t − σ̂ε2 ) ⇒D − B1 (r)dB2 (r) − B1 (r)dr dB2 (r)
ˆ 0 0 0

√ σ̂ε3
ˆ = 2 (2.18)
ψ̂(1)

where W1 (r) and W2 (r) are two Brownian motion processes with zero
covariance.
Comment: as in the LIP case, ‘large’ values of U2 lead to rejection of the null,
in this case that the DGP is a LLIP.

2.2.2.ii Critical values

Some quantiles for the KM tests V1 and V2 for ψ1 = 0 and an AR(1) process
with ψ1 = 0.8 are given in Table 2.2a, and similarly for U1 and U2 in Table
2.2b. The results are presented in the form of simulated quantiles with size in
(.) parentheses, where the size is the empirical size if the quantiles from the
asymptotic distribution are used. The asymptotic distributions of V1 and V2 are
both standard normal.
Taking the ﬁnite sample results in Table 2.2a ﬁrst, size is reasonably well
maintained throughout for V1 and V2 using quantiles from the asymptotic dis-
tribution, which in this case is N(0, 1); size retention is marginally better for
ψ1 = 0.8 compared to ψ1 = 0.0 and, as one would expect, as T increases. The
picture is similar for U1 and U2 (see in Table 2.2b); in this case size retention
is marginally better for ψ1 = 0.0 than for ψ1 = 0.8, but (again) deviations are
slight, for example, an actual size of 4.4% for T = 200 and ψ1 = 0.8.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 37 — #10

38 Unit Root Tests in Time Series Volume 2

Table 2.2 Critical values for KM test statistics

Table 2.2a Critical values for V1 and V2

90% 95% 99%

T ψ1 = 0.0 ψ1 = 0.8 ψ1 = 0.0 ψ1 = 0.8 ψ1 = 0.0 ψ1 = 0.8

100 1.21 1.27 1.55 1.62 2.20 2.24

(8.6) (9.8) (4.0) (4.7) (0.6) (0.8)
200 1.24 1.26 1.58 1.61 2.29 2.28

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(9.2) (9.6) (4.5) (4.6) (0.8) (0.9)
1,000 1.32 1.28 1.65 1.62 2.36 2.30
(10.7) (10.0) (5.0) (4.7) (1.1) (0.9)
∞ 1.282 1.282 1.645 1.645 2.326 2.326
(10.0) (10.0) (5.0) (5.0) (1.0) (1.0)

Table 2.2b Critical values for U1 and U2

90% 95% 99%

T ψ1 = 0.0 ψ1 = 0.8 ψ1 = 0.0 ψ1 = 0.8 ψ1 = 0.0 ψ1 = 0.8

100 0.48 0.49 0.66 0.68 1.08 1.17

(10.1) (10.6) (4.9) (5.4) (0.9) (1.2)
200 0.47 0.47 0.66 0.64 1.13 1.01
(9.8) (9.4) (4.9) (4.4) (1.1) (0.7)
1,000 0.47 0.48 0.65 0.64 1.10 1.05
(9.6) (12.1) (4.7) (4.4) (0.9) (0.7)
∞ 0.477 0.477 0.664 0.664 1.116 1.116
(10.0) (10.0) (5.0) (5.0) (1.0) (1.0)

Notes: Table entries are quantiles, with corresponding size in (.) brackets, where size is the empirical
size using quantiles from N (0, 1) for Table 2.2a and from the quantiles for T = ∞ for Table 2.2b; the
latter are from KM (op. cit., table 1). Results are based on 20,000 replications. A realised value of the
test statistic exceeding the (1 – α)% quantile leads to rejection of the null model at the α% signiﬁcance
level.

The likely numerical magnitude of the drift will depend on whether the inte-
grated process is linear or loglinear and on the frequency of the data. In Table
2.2a, the critical values were presented for β1 = σε , which is 1 in units of the
innovation standard error (β1 /σε = 1). Table 2.3 considers the sensitivity of these
results to the magnitude of the drift for the case T = 100 and κ = β1 /σε ; as in
Table 2.2a the results are in the form of quantile with size in (.) parentheses,
where the size is the empirical size if the quantiles from N(0, 1) are used.
The results in Table 2.3 show that apart from small values of κ, for example
κ = 0.1, the quantiles are robust to variations in κ, but there is a slight under-
sizing throughout for ψ1 = 0.0; the actual size of a test at the nominal 5% level

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 38 — #11

Functional Form and Nonparametric Tests 39

Table 2.3 Sensitivity of critical values for V1 , V2 ; T = 100, ψ1 = 0.0, ψ1 = 0.8

β1 = κσε 90% 95% 99%

ψ1 = 0.0 ψ1 = 0.8 ψ1 = 0.0 ψ1 = 0.8 ψ1 = 0.0 ψ1 = 0.8

κ = 0.1 2.22 2.08 3.87 3.32 14.84 17.98

(19.7) (18.5) (14.8) (12.7) (9.4) (8.6)
κ = 1.0 1.21 1.27 1.55 1.62 2.20 2.24
(8.6) (9.8) (4.0) (4.7) (0.6) (0.8)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
κ = 2.0 1.22 1.24 1.57 1.61 2.24 2.27
(9.2) (9.2) (4.1) (4.7) (0.8) (0.8)
κ = 3.0 1.21 1.29 1.57 1.62 2.19 2.21
(8.9) (10.3) (4.2) (4.8) (0.8) (0.6)
κ = 4.0 1.22 1.29 1.57 1.63 2.26 2.29
(8.9) (10.3) (4.2) (4.8) (0.8) (0.9)
N(0, 1) 1.282 1.282 1.645 1.645 2.326 2.326
(10.0) (10.0) (5.0) (5.0) (1.0) (1.0)

being between about 4% and 4.5%. However, the empirical size is closer to the
nominal size for ψ1 = 0.8. Overall, whilst there is some sensitivity to sample size
and the magnitudes of ψ1 and β1 , the quantiles are reasonably robust to likely
variations.

2.2.2.iii Motivation
Whilst four test statistics were outlined in the previous section, they are moti-
vated by similar considerations. In general, ﬁtting the wrong model will induce
heteroscedasticity, which is detected by assessing the (positive) correlation
between the squared residuals from the ﬁtted model and yt−1 in the case of
the LIP and − log yt−1 in the case of the LLIP.
To illustrate, consider the case where the DGP is the exponential random walk,
ERW, which is the simplest LLIP, but the analysis is pursued in levels; further,
assume initially that zt = εt . Then:

yt
yt = − 1 yt−1 (2.19)
yt−1
= ηyt−1 (2.20)

where η = y t − 1 = (e(εt +β1 ) − 1).
y
t−1

Given that εt and yt−1 are uncorrelated, the conditional variance of yt given
yt−1 is proportional to y2t−1 when the DGP is the ERW, implying that y2t and
y2t−1 are positively correlated. However, when the DGP is a LIP, there is no cor-
relation between y2t and y2t−1 ; thus, the correlation coefﬁcient between y2t
and y2t−1 provides a diagnostic test statistic when the linear model is ﬁtted. KM

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 39 — #12

40 Unit Root Tests in Time Series Volume 2

(op. cit.) ﬁnd that the asymptotic distribution is better approximated when the
test statistic is taken as the correlation coefﬁcient between y2t and yt−1 , with
large values leading to rejection of the linear model. When zt follows an AR pro-
cess, then ε2t , or ε̂2t in practice, replaces y2t , which should be uncorrelated
with yt−1 under the null of the LIP.

2.2.2.iv Power considerations

The power of the KM tests depends not only on T, but also on whether the test

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
is based on the ‘with drift’ or ‘without drift’ model. Some of the results from
the power simulations in KM (op. cit.) are summarised in Table 2.4. Evidently
the test when there is drift is considerably more powerful than in the without
drift case. Even with T = 100, power is 0.96 when the ﬁtted model is linear,
but the DGP is actually log-linear, and 0.81 when the ﬁtted model is log-linear,
but the DGP is actually linear; however, in the absence of drift, these values
drop to 13% and 12% in the former case and only 3% and 4% in the latter case,
respectively. This suggests caution in the use of the non-drifted tests, which
require quite substantial sample sizes to achieve decent power. However, KM
(op. cit.) found that their tests are more powerful, sometimes considerably so,
than an alternative test for heteroscedasticity, in this case the well-known LM
test for ARCH (see Kobayashi and McAleer, 1999b) for a power comparison of a
number of tests.

2.2.3 Illustrations
As noted, the KM non-nested tests are likely to be of greatest interest when a
standard unit root test is unable to discriminate between which of two integrated
processes generated the data, ﬁnding that both the levels and the logarithms
support non-rejection of the unit root. An important prior aspect of the use of
the tests is whether the integrated process is with or without drift. Also, since
the issue is whether the unit root is in either the levels or the logarithms, the
data should be everywhere positive. The economic context is thus likely to be of

Table 2.4 Power of KM nonnested tests for 5% nominal

size

Null: linear Null: log-linear

Alternative: log-linear Alternative: linear

T V1 (drift) U1 (no drift) V2 (drift) U2 (no drift)

50 0.29 0.07 0.23 0.08

100 0.96 0.13 0.81 0.12
200 1.00 0.25 1.00 0.26

Note: extracted from KM (1999a, table 4), based on 10,000 replica-

tions; in the with drift cases, β1 = 0.01 and κ = 1.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 40 — #13

Functional Form and Nonparametric Tests 41

importance, for example the default for aggregate expenditure variables, such
as consumption, GDP, imports and so on, is that if generated by an integrated
process, there is positive drift; whereas series such as unemployment rates are
unlikely to be generated by a drifted IP.

2.2.3.i Ratio of gold to silver prices

The ﬁrst illustration is the ratio of the gold to silver price, obtained from the

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
daily London Fix prices for the period 2nd January, 1985 to 31st March 2006;
weekends and bank holidays are excluded, giving an overall total of 5,372 obser-
vations. This series, referred to as yt is obviously everywhere positive, so there
is an issue to be decided about whether it is generated by an LIP or an LLIP.
In neither case is there evidence of drift (see Figure 2.1, left-hand panel for the
‘levels’ data and right-hand panel for the log data), and the appropriate KM tests
are, therefore, U1 and U2 , that is the driftless non-nested tests. The levels series
has a maximum of 100.8 and a minimum of 38.3, and it crosses the sample
mean of 68.5, 102 times, that is just 1.9% of the sample observations, which
is suggestive of random walk-type behaviour. The size of the sample provides
a safeguard against the low power indicated by the simulation results for the
driftless tests.
For illustrative purposes, the standard ADF test τ̂μ was used. The maximum
lag length was set at 20, but in both the linear and loglinear versions of the
ADF regressions BIC selected ADF(1); in any case, longer lags made virtually
no difference to the value of τ̂μ , which was –2.433 for the LIP and –2.553 for
the LLIP, both leading to non-rejection of the null hypothesis at conventional
significance levels, for example, the 5% critical value is –2.86. For details of the
regressions, see Table 2.5.
As an indicator of the likely result of the KM tests, note that the correla-
tion coefficient between the squared residuals and yt−1 in the linear case is
0.004, indicating no misspecification, whereas for the log-linear case the cor-
relation coefficient between the squared residuals and − ln yt−1 is 0.107. The
latter correlation suggests that if a unit root is present it is in the linear model,
which is confirmed by the sample value of U1 = 0.097, which is not significant,
whereas U2 = 4.57, which is well to the right of the 99th percentile of 1.116, see
Table 2.2b.

2.2.3.ii World oil production

The second illustration is world oil production, using monthly data from Jan-
uary 1973 to February 2005, giving an overall sample of 386 observations. The
data are graphed in Figure 2.2 in levels and logs; it is apparent that there is
a positive trend and, hence, it is necessary to allow for drift if there is a unit
root in the generating process. The appropriate KM tests are, therefore, V1 and

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 41 — #14

42 Unit Root Tests in Time Series Volume 2

110 5

levels logs
100 4.8

90 4.6

80 4.4

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
70 4.2

60 4

50 3.8

40 3.6

30 3.4
1985 1990 1995 2000 2005 1985 1990 1995 2000 2005

Figure 2.1 Gold–silver price ratio

Table 2.5 Regression details: gold–silver price ratio

Constant yt−1 yt−1 KM test (95% quantile)

yt 0.181 –0.0026 –0.161

(‘t’) (–2.43) (–11.91)
yt – – –0.162 U1 = 0.097 (0.664)
(‘t’) (–12.01)
Constant ln(yt−1 ) ln(yt−1 ) KM test (95% quantile)
ln(yt ) 0.012 –0.0029 –0.150
(‘t’) (–2.55) (–11.08)
ln(yt ) – – –0.151 U2 = 4.57 (0.664)
(‘t’) (–11.19)

V2 , which allow for drift; and the DF pseudo-t test is now τ̂β , which includes a
time trend in the maintained regression to allow the alternative hypothesis to
generate a trended series.
Whilst, in this example, BIC suggested an ADF(1) regression, the marginal-t
test criterion indicated that higher order lags were important and, as the sample
is much smaller than in the ﬁrst illustration, an ADF(5) was selected as the test
model. As it happens, the KM test statistics were, however, virtually unchanged

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 42 — #15

Functional Form and Nonparametric Tests 43

x 104
7.5

levels 11.2 logs

7 11.15

11.1
6.5
11.05

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
11
6
10.95

10.9
5.5
10.85

5 10.8

10.75

4.5 10.7
1980 1990 2000 1980 1990 2000

Figure 2.2 World oil production

between the two specifications and the conclusions were not materially different
for the different lag lengths. As in the first illustration, the DF test is unable to
discriminate between the linear and log specification, with τ̂β = −1.705 and τ̂β =
−1.867, respectively, neither of which lead to rejection of the null hypothesis,
with a 5% critical value of –3.41; see Table 2.6 for the regression details.
The correlation coefficient between the squared residuals and yt−1 in the linear
case is –0.094, indicating no misspecification in the log-linear direction, whereas
for the log-linear case the correlation coefficient between the squared residuals
and − ln yt−1 is 0.132. The KM test values are V1 = –4.77, which is wrong-signed
for rejection of the null hypothesis, and V2 = 15.83, which has a p-value of zero
under the null. Hence, if there is an integrated process, then the conclusion is
in favour of a linear model rather than a log-linear model.

2.3 Monotonic transformations and unit root tests based on ranks

A problem with applying standard parametric tests for a unit root is that the
distribution of the test statistic under the null and alternative is not generally
invariant to a monotonic transformation of the data. This was illustrated in
Section 2.1 for the DF tests for the case of loglinear transformations, although

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 43 — #16

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 44 — #17

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Table 2.6 Regression details: world oil production

Levels C yt−1 yt−1 yt−2 yt−3 yt−4 yt−5 Time

yt 1,531 –0.029 –0.102 –0.093 –0.101 –0.058 –0.139 1.60

(‘t’) (–1.71) (–1.93) (–1.79) (–1.95) (–1.12) (–2.72) (1.94)

yt 75.5 – –0.120 –0.108 –0.113 –0.069 –0.148 –

(‘t’) (–2.32) (–2.11) (–2.22) (–1.35) (–2.91)
Logs C ln yt−1 ln yt−1 ln yt−2 ln yt−3 ln yt−4 ln yt−5 Time
ln(yt ) 0.36 –0.033 –0.097 –0.089 –0.104 –0.071 –0.139 0.000
(‘t’) (–1.87) (–1.84) (–1.71) (–2.01) (–1.39) (–2.72) (1.98)

ln(yt ) 0.001 – –0.120 –0.107 –0.119 –0.084 –0.149 –

(‘t’) (–2.29) (–2.09) (–2.33) (–1.65) (–2.94)

KM tests
Null, linear; V1 = −4.77 95% quantile = 1.645
Alternative,loglinear
Null, log-linear; V2 = 15.83 95% quantile = 1.645
Alternative, linear

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

Functional Form and Nonparametric Tests 45

it holds more generally (see GH, 1991). Basing a unit root test on ranks avoids
this problem.
The situation to be considered is that the correct form of the generating pro-
cess is not known other than, say, it is a monotonic and, hence, order-preserving
function of the data yt ; thus, as before, let yt be the ‘raw’ data and let wt = f(yt )
be some transformation of the data, such that wt is generated by an autoregres-
sive process. The linear versus log-linear choice was a special case of the more
general situation. In the (simplest) ﬁrst order case, the generating process is:

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
wt = ρwt−1 + εt εt ∼ iid(0, σε2 ), σε2 < ∞ (2.21)

The parameter ρ is identified and, hence, we can form the null hypothesis, H0 :
there exists a monotonic function wt = f(yt ) such that ρ = 1 in wt = ρwt−1 + εt .
A ranks-based test proceeds as follows. Consider the values of Y = {yt }Tt=1
ordered into a set (sequence) such that Y(r) = {y(1) < y(2) . . . < y(T) }; then y(1) and
y(T) are the minimum and maximum values in the complete set of T observa-
tions. To find the rank, rt , of yt , first obtain its place in the ordered set Y(r) , then
its rank is just the subscript value; the set of ranks is denoted r. For example,
if {yt }Tt=1 = {y1 = 3, y2 = 2, y3 = 4}, then Y(r) = {y(1) = y2 , y(2) = y1 , y(3) = y3 },
and the ranks are r = {rt }3t=1 = {2, 1, 3}. Most econometric software contains a
routine to sort and rank a set of observations.

2.3.1 DF rank-based test

An obvious extension of the DF tests is to apply them to the ranks of the series,
that is to r, rather than the series, Y, itself. The time series of (simple) ranks
can be centred by subtracting the mean of the ranks, say Rt = rt − r̄, where r̄ =
(T + 1)/2, assuming T observations in the series. A simple extension of the DF
pseudo-t tests using the ranks, either the centred rank Rt with no intercept in
the regression, or rt , with an intercept in the regression, was suggested by GH
(1991). In the former case the DF regression is:

Rt = ρr Rt−1 + νt (2.22)

Rt = γr Rt−1 + νt (2.23)

The standard DF tests are now deﬁned with respect to the (centred) ranks:

δ̂r,μ = T(ρ̂r − 1) (2.24)

γ̂r
τ̂r,μ =
σ̂(γ̂r )
(ρ̂r − 1)
= (2.25)
σ̂(ρ̂r )
where ρ̂r is the LS estimator based on the ranks and the testing procedure is
otherwise as in the standard case.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 45 — #18

46 Unit Root Tests in Time Series Volume 2

The case of a deterministic mean under the alternative hypothesis causes no

problems, as the ranks are not altered by the subtraction of a non-zero mean.
However, for trended possibilities that occur if the random walk under the null
is drifted or the stationary deviations under the alternative are around a deter-
ministic trend, the ranks will tend to increase over time. A supposition in this
context is that it is valid to derive test statistics analogous to the parametric
case by ﬁrst detrending the series and then constructing the ranks from the
detrended series; the ranked DF, RDF, test statistics in this case are denoted δ̂r,β

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
and τ̂r,β .
In the event that the νt are serially correlated, GH suggested a rank extension
of the augmented DF test, say RADF, in which the maintained regression (here
using the centred ranks) is:
k−1
Rt = γr Rt−1 + cr,j Rt−j + νt (2.26)
j=1

Although this version of a rank test has not been justiﬁed formally, the simula-
tion results reported in Section 2.6, show that the RADF test, with the simulated
quantiles for the DF rank test, does perform well; see also Fotopoulus and
Ahn (2003). By way of comparison, the rank-score test due to Breitung and
Gouriéroux (1997), hereafter BG, considered in the next Section, 2.3.2, does
allow explicitly for serially correlated errors.
BG (op.cit.) show that the asymptotic distribution of τ̂r,μ is not the same as
the corresponding parametric test, the key difference being that T−1 r[rT] does
not converge to a Brownian motion, where T is the sample size, 0 ≤ r ≤ 1 and
[.] represents the integer part. Fotopoulus and Ahn (2003) extend these results
to obtain the relevant asymptotic distributions. If the standard DF quantiles are
used, then the rank tests tend to be oversized. The extent of the over-sizing is
illustrated in Table 2.7, which reports the results of a small simulation study
where the innovations are drawn from one of N(0, 1) and t(2). Although the
wrong sizing is not great, for example 6.1% for a nominal 5% test with T = 200,
the over-sizing does not decline with the sample size.

Table 2.7 Size of δ̂r,μ and τ̂r,μ using quantiles for δ̂μ and τ̂μ

Size T = 200 T = 500

δ̂μ δ̂r,μ τ̂μ τ̂r,μ δ̂μ δ̂r,μ τ̂μ τ̂r,μ

N(0, 1) 5.1% 10.5% 5.0% 6.1% 4.9% 11.7% 4.9% 7.4%

t(2) 4.9% 10.5% 5.0% 6.8% 5.0% 11.8% 4.9% 7.6%

Source: results are for 50,000 replications of a driftless random walk; the δ̂μ and
τ̂μ tests are from a DF regression with an intercept; and δ̂r,μ and τ̂r,μ use the
mean-adjusted ranks.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 46 — #19

Functional Form and Nonparametric Tests 47

Table 2.8 Critical values of the rank-based DF tests δ̂r,μ ,

τ̂r,μ , δ̂r,β and τ̂r,β

δ̂r,μ τ̂r,μ

T 100 200 500 100 200 500

1% –23.04 –24.19 –25.39 –3.58 –3.58 –3.62
5% –16.39 –17.15 –17.84 –2.98 –3.00 –3.03
10% –12.40 –14.07 –14.65 –2.68 –2.71 –2.75

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
δ̂r,β τ̂r,β
T 100 200 500 100 200 500
1% –31.30 –32.67 –34.14 –4.20 –4.20 –4.20
5% –23.17 –24.53 –25.68 –3.60 –3.60 –3.63
10% –19.80 –20.85 –21.85 –3.29 –3.30 –3.35

Source: own calculations based on 50,000 replications.

Rather than use the percentiles for the standard DF tests, the percentiles can
be simulated; for example, some critical values for δ̂r,μ , τ̂r,μ , δ̂r,β and τ̂r,β for
T = 100, 200 and 500 are provided in Table 2.8. The maintained regression for
these simulations used the mean-adjusted ranks, with 50,000 replications of the
null of a driftless random walk. Other sources for percentiles are Fotopoulus and
Ahn (2003), who do not use mean-adjusted ranks, and GH (1991), who include a
constant in the maintained regression rather than use the mean-adjusted ranks.
As in the case of the standard DF tests, a sample value that is larger negative
than the critical value leads to rejection of the null of a unit root in the data.
More precisely, rejection of the null carries the implication that, subject to the
limitations of the test, there is not an order preserving transformation of the data
that provides support for the null hypothesis. Non-rejection implies that there
is such a transformation (or trivially that no transformation is necessary) that
is consistent with a unit root AR(1) process; but it does not indicate what is the
transformation, which has to be the subject of further study. Other rank-based
tests are considered in the next sections.

2.3.2 Rank-score test, Breitung and Gouriéroux (1997)

BG (op. cit.) developed a ‘rank’ extension of the parametric score test suggested
by Schmidt and Phillips (1992). The test is designed for a DGP that under H0
is a random walk with drift, whereas under HA it is trend stationary, that is H0
and HA are, respectively:

H0 : yt = β1 + εt (2.27)
HA : (1 − ρL)(yt − β0 − β1 t) = εt |ρ| < 1 (2.28)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 47 — #20

48 Unit Root Tests in Time Series Volume 2

where εt ∼ iid and E(εt ) = 0, with common cdf denoted F(εt ). The aim is to
obtain a test statistic for H0 that can be extended to the case where the sequence
{εt } comprises serially correlated elements. It is convenient here to adopt the
notation that the sequence starts at t = 0, thus {εt }t=T
t=0 .
The test statistic is an application of the score, or LM, principle to ranks. In
the parametric case, the test statistic for H0 is:
T
xt St−1
Sβ = t=2

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
T
(2.29)
2
t=2 St−1

where xt = yt − β̂1 , which is the demeaned data under the null, where β̂1 is the
sample mean of yt , that is (yT − y0 )/T; and St is the partial sum of xt , given by

St = tj=1 xj .
Analogously, the rank version of this test is:
T R
t=2 r̃t St−1
Sr,β = T (2.30)
t=2 (St−1 )
R 2

where r̃t is the (demeaned) rank of yt among (y1 , . . . , yT ), that is, the order-
ing is now with respect to yt , such that y(1) < y(2) < . . . < y(T) . The ordering
is with respect to yt rather than yt , as under the LM principle the null of a unit
root is imposed. A β subscript indicates that the null and alternative hypotheses
allow trended behaviour in the time series either due to drift under H0 or a trend
under HA .
SR is the partial sum to time t of the (demeaned) ranks of yt , that is SRt =
t t
j=1 r̃j . The ranks of the differences are demeaned by subtracting (T + 1)/2, or
T/2 if the convention is that t = 1 is the start of the process. That is, let R̃(yt )
be the rank of yt among (y1 , . . . , yT ), then deﬁne:

T+1
r̃t ≡ R̃(yt ) − demeaned ranks (2.31)
2

The normalised, demeaned ranks are then:

r̃t
r̃tn ≡ (2.32)
T
BG (op. cit.) show that the numerator of Sr,β is a function of T only, in which
case a test statistic can be based just on the denominator, with their test statistic
a function of λ, where:
T
λ= (SRt )2 (2.33)
t=1

This expression just differs from the denominator of Sr,β by the inclusion
of (SRT )2 .

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 48 — #21

Functional Form and Nonparametric Tests 49

The BG rank-score test statistic is λr,β , deﬁned as follows:

12
λr,β ≡ λ
T4
2
12 T t
= r̃j
T4 t=1 j=1
2
12 T t n
= r̃ (2.34)
T2 t=1 j=1 j

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
where r̃tn is given by (2.32). The distributional results are based on the following
theorem (BG, op. cit., Theorem and Remark A):
1
λr,β ⇒D (B(r) − rB(1))2 dr (2.35)
0

Finite sample critical values are given in BG (op. cit.), some of which are
extracted into Table 2.9 below. The test is ‘left-sided’ in the sense that a sam-
ple value less than the α% quantile implies rejection of the null hypothesis. The
null distribution of λr,β is independent of F(εt ), both asymptotically and in ﬁnite
samples; and the limiting null distribution is invariant to a break in the drift
during the sample, that is, if, under the null, β1 shifts to β1 + δ at some point Tb .
Referring back to H0 of (2.27), a special case arises if β1 = 0, corresponding
to the random walk without drift, yt = εt ; however as λr,β is a test statistic
based on the ranks of yt , it is invariant to monotonic transformations of yt ,
a simple example being when a constant is added to each observation in the
series, so the same test statistic would result under this special null.
Variations on the test statistic λr,β are possible, which may improve its small
sample performance. BG (op. cit.) suggest a transformation of the ranks using
the inverse standard normal distribution function, which can improve power
when the distribution of εt is not normal (see BG, op. cit., table 2). A nonlinear
transformation is applied to the normalised ranks r̃tn using the inverse of the
standard normal distribution function (.) and the revised ranks are:

r̃tins = −1 (r̃tn + 0.5) (2.36)

The test statistic, with this variation, using r̃tins rather than r̃tn , is referred to as
(ins) (ins)
λr,β . Some quantiles for λr,β and λr,β are given in Table 2.9.

2.3.2.i Serially correlated errors

In the event that the errors are serially correlated, then the asymptotic distribu-
tion of λr,β is subject to a scale component, which is analogous to (but simpler
than) the kind of adjustment applied to obtain the Phillips-Perron versions of
the standard DF tests (see UR Vol. 1, chapter 6). The end result is a simple trans-
formation of λr,β , using a consistent estimator of the long-run variance (see
Chapter 1, Section 1.4).

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 49 — #22

50 Unit Root Tests in Time Series Volume 2

Table 2.9 Quantiles for the BG rank-score tests

T = 100 T = 250 T =∞
(ins) (ins) (ins)
λr,β λr,β λr,β λr,β λr,β λr,β

1% 0.0257 0.0259 0.0252 0.0252 0.0250 0.0250

5% 0.0373 0.0376 0.0367 0.0370 0.0362 0.0368
10% 0.0466 0.0470 0.0461 0.0465 0.0453 0.0464

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Source: extracted from BG (op. cit., table 6).

First, deﬁne the long-run variance in the usual way (see also UR Vol. 1, chapter
6), but applied to the ranks:
2
σr,lr = limT→∞ E[T−1 (SRT )2 ]
2
T
= limT→∞ E[T−1 r̃j ] (2.37)
j=1

Then asymptotically under H0 :

1 1
λr,β ⇒D (B(r) − rB(1))2 dr (2.38)
σr,lr
2
0

The long-run variance σr,lr

2 can be estimated by ﬁrst regressing the ranks r̃ on
t
SRt−1 , to obtain the residuals denoted ν̃t , see (2.22). A consistent estimator of the
long-run variance is given by:
⎛ ⎞

T
M T
σ̃r,lr = 12 ⎝
2 2
ν̃t /T + 2 ν̃t−κ ν̃t /T⎠ (2.39)
t=1 k=1 t=κ+1

This is a semi-parametric form in keeping with the intention of the test not to
involve a parametric assumption, such as would be involved in using an AR
2 . Provided M/T → 0 and M → ∞ as T → ∞, then
approximation to obtain σ̃r,lr
2 is consistent for σ2 . The scaling factor 12 outside the brackets in (2.39)
σ̃r,lr r,lr
arises because if the errors are iid, then the limiting value of the ﬁrst term in the
brackets is 12−1 as T → ∞; see BG (op. cit., p.16). This reﬂects the normalisation
2 = 1.
that if the errors are iid, then σr,lr
To ensure positivity of the estimator, a kernel function ωM (κ) can be used,
resulting in the revised estimator, σ̃r,lr
2 :

⎛ ⎞

T
M
T
σ̃r,lr (κ) = 12 ⎝
2 2
ν̃t /T + 2 ωM (κ) ν̃t−κ ν̃t /T⎠ (2.40)
t=1 k=1 t=κ+1

The frequently applied Newey-West kernel function uses the Bartlett weights
ωM (κ) = 1 − κ/(M + 1) for κ = 1, ..., M.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 50 — #23

Functional Form and Nonparametric Tests 51

The revised test statistic, with the same asymptotic distribution as λr,β , is:

1
λlr,β = λr,β (2.41)
σ̃r,lr
2

An obvious extension gives rise to the version of the test that transforms the
(ins)
ranks using the inverse standard normal distribution and is referred to as λlr,β .
Some asymptotic critical values are given in the last column (T = ∞) of Table 2.7.

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2.3.2.ii Simulation results
Some results from Monte-Carlo simulations in BG (op. cit.) are summarised here,
taking T = 100 as illustrative of the general results. First, with wt = yt , so that the
data are not transformed, and the εt are drawn alternately from N(0, 1), t(2) and
the centred χ2 (2) distribution. Both the rank test statistic λr,β and the DF test
statistic τ̂β maintain their size well under the different distributions for εt , with τ̂β
generally more powerful than λr,β , especially for non-normal innovations. The
advantage of λr,β is apparent when an additive outlier (AO) of 5σ contaminates
the innovations at T/2. In this case λr,β maintains its size, whereas τ̂β becomes
over-sized, for example an empirical size of 17.4% for a nominal 5% size.
The AO situation apart, there is no consistent advantage in the rank-score test
when it is applied to wt = yt , which is not surprising given its nonparametric
nature. More promising is the case when the random walk is generated in a
nonlinear transformation of the data, so that wt = f(yt ), with f(.) alternately
1/3
given by f(yt ) = y3t , f(yt ) = yt , f(yt ) = ln yt and f(yt ) = tan(yt ), and the random
walk, without drift, is in wt , that is the random walk DGP is in the transformed
data. In these cases, λr,β maintains its size, whereas τ̂β is seriously over-sized and
(ins)
the λr,β version of the test becomes moderately over-sized.
The case where wt = ln(yt ), but the test uses yt , was considered in Section 2.1,
(referred to as Case 2), where it was noted that the rejection rate was typically
of the order of 80–90% using τ̂μ at a nominal 5% size. As to the rank-score tests,
(ins)
BG (op. cit.) report that λr,β and λr,β have empirical sizes of 5.1% and 8.2%,
respectively. The problematic case of MA(1) errors was also considered and it also
(ins)
remains a problem for the rank-score tests λr,β and λr,β , with serious under-
sizing when the MA(1) coefﬁcient is positive and serious over-sizing when the
MA(1) coefﬁcient is negative.
Illustrations of the rank-score tests and other tests introduced below are given
in Section 2.6.3.

2.4 The range unit root test

Another basis on which to construct a nonparametric test, which is also invari-

ant to monotonic transformations, is the range of a time series; such a test
was suggested by Aparicio, Escribano and Sipols (2006), hereafter AES. Consider

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 51 — #24

52 Unit Root Tests in Time Series Volume 2

the sorted, or ordered, values of {yt }t=i

t=1 , such that y(1) < y(2) . . . < y(i) ; then y(1)
and y(i) are the minimum and maximum values respectively, referred to as the
extremes, of y(t) in a set of i observations. The range is simply the difference
between the extremes, RiF = y(i) – y(1) ; the i subscript indicates the length of the
sequence of yt , hence, within each sequence, t runs from 1 to i. The F super-
script is redundant at this stage, but just indicates that this is the range when the
observations are in their ‘natural’ time series order, running ‘forward’ in time;
in a later section this order is reversed.

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2.4.1 The range and new ‘records’
Let the length of the sequence, i, vary from i = 2, …, T, where T is the overall
sample size. For i = 2, y(1) < y(2) (for simplicity equality is ruled out); now
consider adding a 3rd observation, y3 , then there are three possibilities: Case 1,
y(1) < y3 < y(2) ; Case 2, y3 < y(1) < y(2) ; Case 3, y(1) < y(2) < y3 . In Case 1, y(1) is
still the minimum value and y(2) is still the maximum value; thus, the extremes
are unchanged and, therefore, the range remains unchanged; in Case 2, y3 is
the new minimum; whereas in Case 3, y3 is the new maximum. In Cases 2 and
3, the range has changed and, therefore, RiF > 0. In the terminology associated
with this test, in Cases 2 and 3 there is a new ‘record’.
Next, deﬁne an indicator variable as a function of RiF so that it counts the
number of new records as the index i runs from 1 to T. The number of these new

records in a sample of size T is: Ti=1 I(RiF > 0), where I(RiF > 0) = 0 if RiF = 0
and I(RiF > 0) = 1 if RiF > 0. (For simplicity, henceforth, when the nature of
the condition is clear then the indicator variable is written as I(RiF ).) Thus, in
the simple illustration above, in Case 1 R3F = 0, whereas in Cases 2 and 3 R3F

> 0. The average number of new records is R̄F = T−1 Ti=1 I(RiF ). The intuition
behind forming a test statistic based on the range is that the number of new
records declines faster for a stationary series (detrended if necessary) compared
to a series that is nonstationary because of a unit root. The test described here,
referred to as the range unit root, or RUR, test is due to AES (op.cit.).
It is intuitively obvious that a new record for yt is also a new record for a
monotonic transformation of yt , including nonlinear transformations, and this
is the case under the null and the stationary alternative. Simulations reported
in AES (op. cit., especially table IV) show that the nominal size is very much
better maintained for the RUR test compared to the DF t-type test for a wide
range of monotonic nonlinear transformations, typically 3–5% for a nominal
5% test when T ≤ 250, and is virtually the same for T = 500.
Thus far the implicit background has been of distinguishing a driftless random
walk from a trendless stationary process. However, care has to be taken when
dealing with DGPs that involve trending behaviour, since the number of new
records will increase with the trend. The alternatives in this case are a series non-
stationary by virtue of a unit root, but with drift, and a series stationary around

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 52 — #25

Functional Form and Nonparametric Tests 53

a deterministic (linear) time trend: both will generate trending behaviour. The
trending case is dealt with after the trendless case; see Section 2.4.4.

2.4.1.i The forward range unit root test

This section describes the rationale for a unit root test statistic based on the
range of a variable generically denoted yt . For simplicity this can be thought
of as the level of a series (the ‘raw’ data), but equally, as the test statistic is
invariant (asymptotically and in ﬁnite samples) to monotonic transformations,

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
this could be the logarithm, or some other monotonic transformation, of the
original series. Otherwise the set-up is standard, with the null hypothesis (in this
section) that yt is generated by a driftless random walk, yt = yt−1 + zt , where
zt = εt , and {εt }Tt=1 is an iid sequence of random variables with E(εt ) = 0 and
variance σε2 .
As a preliminary to the test statistic, consider the properties of a function
of the sample size, say (T), such that the following condition is (minimally)
satisﬁed:
T
(T) I(RiF > 0) = Op (1) (2.42)
i=1

Then for stationary series, (T) = ln(T)−1 ; for I(1) series without drift, (T) =
√
1/ T; and for I(1) series with drift, (T) = 1/T. (The ﬁrst of these results requires
a condition that the i-th autocovariance decreases to zero faster than ln(T)−1 ; see
AES (op. cit.), known as the Berman condition, which is satisﬁed for Gaussian
ARMA processes.)
Thus, a test statistic based on the range for an I(1) series without drift can be
formed as:
T
F
RUR = T−1/2 I(RiF )
i=1

= T1/2 R̄F (2.43)

Given the rate of decrease of new records, RUR F will tend to zero in probability

for a stationary time series and to a non-degenerate random variable for an I(1)
series. Thus, relatively, RURF will take large values for an I(1) series and small

values for an I(0) series.

In order to formalise the test procedure let RUR F (α, T) > 0 be the α% quantile
F
of the null distribution of RUR based on T observations, and let R̃UR F be a sample
F
value of the range unit root test statistic RUR , then the null hypothesis is rejected
at the α% signiﬁcance level if R̃UR F < R F (α, T), otherwise the unit root null is
UR
not rejected. To be clear, in the case of trendless alternatives it is ‘small’ values
of the test statistic that lead to rejection.
Theorem 1 of AES (op. cit.) establishes the asymptotic null distribution of
RUR , as follows. Let the DGP under the null be yt = yt−1 + εt , where εt ∼
F

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 53 — #26

54 Unit Root Tests in Time Series Volume 2

Table 2.10 Critical values of the forward range unit

F (α, T)
root test, RUR

T= 100 250 500 1,000 2,000 5,000

1% 0.9 0.939 1.012 1.044 1.118 1.146

5% 1.1 1.201 1.208 1.265 1.275 1.315
10% 1.3 1.328 1.386 1.423 1.453 1.451
90% 2.8 2.973 3.040 3.060 3.080 3.110
95% 3.1 3.289 3.354 3.352 3.444 3.470

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Source: based on AES (op. cit., table I); 10,000 replications of
the null model with εt ∼ niid(0, 1).

iid(0, σε2 ), t ≥ 1, the random variable εt is continuous and y0 has a bounded

pdf and ﬁnite variance, then the asymptotic distribution of RURF is given by:

2 1 2
F
RUR ⇒D (ξ + η)2 e− 2 (ξ +η) (2.44)
π
where ξ →p B̄(1), η →p LB (0, 1) where the latter is the local time at zero of a
Brownian motion in [0, 1]; see AES (op. cit., Deﬁnition 1 and Appendix A2).
Critically, AES (op. cit., Theorem 1, part (3)) show that if yt is a stationary
Gaussian series (with covariance sequence satisfying the Berman condition),
then as T → ∞:
F
RUR →p 0

Thus, RUR F is consistent against a stationary Gaussian series and left-sided

quantiles of the distribution of RUR F lead to rejection against such stationary

(trendless) alternatives. The Berman condition is not particularly demanding;

as noted it is satisﬁed by stationary Gaussian ARMA processes and the simulation
results in AES, which cover fat-tailed as well as asymmetric error distributions,
suggest that the size and power of RUR F are robust to deviations from Gaussianity.

AES show that the ﬁnite sample distribution for T = 1, 000, under the null yt =
yt−1 + εt , with εt ∼ niid(0, 1), t ≥ 1, is very close to the asymptotic distribution
of their Theorem 1, especially in the left-tail. Some critical values are reported
in Table 2.10; this table also includes the quantiles for 90% and 95% as they are
required for the situation when the test is used to distinguish between trending
alternatives.
F
2.4.1.ii Robustness of RUR
F is robust to a number of important departures from
AES (op. cit.) show that RUR
standard assumptions. In particular it is robust to the following problems.

i) Departures from niid errors (which is what the ﬁnite sample, simulated crit-
ical values are based upon); for example, draws from fat-tailed distributions

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 54 — #27

Functional Form and Nonparametric Tests 55

such as the ‘t’ distribution and the Cauchy distribution, and asymmetric
distributions (see AES, op. cit., table III).
ii) Unlike the parametric DF-based tests, RUR F is reasonably robust to structural

breaks in the stationary alternative. The presence of such breaks may ‘con-
fuse’ standard parametric tests into assigning these to permanent effects,
whereas they are transitory, leading to spurious non-rejection of the unit
root null hypothesis. The cases considered by AES were: the single level shift,
that is where the alternative is of the form yt = ρyt−1 + δ1 Dt,1 + εt , |ρ| < 1 and

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Dt,1 = 0 for t ≤ tb,1 , Dt,1 = 1 for t > tb,1 , where tb,1 = T/2; and a multiple shift,

in this case where the alternative is of the form yt = ρyt−1 + 2i=1 δi Dt,i + εt ,
where |ρ| < 1 and Dt,i = 0 for t ≤ (T/4)i, Dt,i = 1 for (T/4)i < tb,i ≤ (T/2)i.
AES (op. cit.) found that: (a) RUR F outperformed the DF t-type test, which

had no power in most break scenarios; (b) but for the power of RUR F to be

maintained, compared to the no-break case, the sample size had to be quite
large, for example T = 500; (c) the power of RUR F deteriorated in the two-break

case and, generally, deteriorated as the break magnitude increased.

iii) Additive outliers in I(1) time series. If the true null model is a random walk
‘corrupted’ by a single additive outlier, AO, then the DGP is: xt = yt + δAta
where yt = yt−1 +εt , Ata = 1 for t = ta and 0 otherwise. An AO in standard
parametric tests, for example, the DF t-type test, shifts the null distribution
to the left, so that a rejection that occurs because the sample value is to the
left of the standard (left-tailed) α% critical value may be spurious given that
the correct critical value is to the left of the standard value. Provided that
ta is not too early in the sample, generally later than the ﬁrst quarter of the
sample, RUR F improves upon the DF test. Protection against the problem of

an early AO is achieved by the forward–backward version of RUR F .

2.4.2 The forward–backward range unit root test

Resampling usually takes place through bootstrapping, which is a form of ran-
dom sampling based on the original sample (see UR, Vol. 1, chapter 8). AES (op.
cit.) suggested a particularly simple form of resampling that improves the power
of the range unit–root test and offers more protection against additive outliers.
This is the forward–backward version of RUR F , denoted, R FB , which, as in the
UR
F
case of RUR , runs the test forward in the sample counting the number of new
records from the beginning to the end, it then continues the count but now
starting from the end of the sample back to the beginning; thus, RUR F counts
B
forward, whereas RUR counts backwards. The revised test statistic, analogous to
(2.43), is:

T
FB 1 T F B
RUR =√ I(Ri ) + I(Ri ) (2.45)
2T i=1 t=1

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 55 — #28

56 Unit Root Tests in Time Series Volume 2

Table 2.11 Critical values of the empirical distri-

FB (α, T)
bution of RUR

T= 100 250 500

1% 1.556 1.565 1.613

5% 1.839 1.878 1.897
10% 1.980 2.057 2.087
90% 3.960 4.159 4.300
95% 4.384 4.651 4.743

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
99% 5.303 5.545 5.692

Source: own calculations; 10,000 replications of the null

model with εt ∼ niid(0, 1).

where RiB counts the new records on the time-reversed series yBt = yT−t+1 ,
t = 1, 2, . . . , T. The revised test statistic RUR
FB has better size ﬁdelity than R F for
UR
AOs in the null DGP early or late in the sample (but not both); it has better
power compared to RUR F against stationary alternatives with a single structural

break; and there are no signiﬁcant differences for monotonic nonlinearities.

Some critical values for RUR FB are given in Table 2.11.

2.4.3 Robustness of RFUR and RUR

Given the motivation for using a nonparametric test statistic, we consider the
sensitivity of the ﬁnite sample critical values to different error distributions. By
way of an illustration of robustness, consider the quantiles when {εt } is an iid
sequence and alternatively drawn from a normal distribution and the t distri-
bution with 3 degrees of freedom, t(3). The latter has substantially fatter tails
than the normal distribution. One question of practical interest is: what is the
actual size of the test if the quantiles assume εt ∼ niid(0, 1) but, in fact, they
are drawn from t(3)? In this case, and taking T = 100, 250, by way of example,
the actual size is then compared with the nominal size. By way of benchmark-
ing, the results are also reported for the DF t-type test statistics, τ̂μ and τ̂β . Some
of the results are summarised in Table 2.12, with QQ plots in Figures 2.3 and
2.4, the former for the DF tests and the latter for the range unit root tests. (The
QQ plots take the horizontal axis as the quantiles assuming εt ∼ niid(0, 1) and
the vertical axis as the quantiles realised from εt ∼ t(3); differences from the 45◦
line indicate a departure in the pairs of quantiles.)
There are four points to note about the results in Table 2.12 and the
illustrations in Figures 2.3 and 2.4.

i) The DF t-type tests are still quite accurate as far as nominal and actual size
are concerned; where there is a departure, it tends to be at the extremes of
the distribution (see Figure 2.3), which are not generally used for hypothesis
testing.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 56 — #29

Functional Form and Nonparametric Tests 57

Table 2.12 Actual and nominal size of DF and range unit

root tests when εt ∼ t(3) and niid(0, 1) quantiles are used

1% 5% 10%

T = 100
τ̂μ 1.08% 5.06% 10.32%
τ̂β 1.39% 5.03% 9.55%
F
RUR 1.85% 5.35% 10%
FB
RUR 1.22% 5% 10%

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
T = 250
τ̂μ 1.39% 5.43% 10.27%
τ̂β 1.07% 4.87% 9.32%
F
Rur 1.57% 6.16% 10.13%
FB
Rur 1% 5% 10%

Notes: based on 10,000 replications of the null model; embolden

indicates exact size maintained.

2
DF: τμ
t(3) quantiles

–2

–4 slight departures at extreme quantiles

T = 250
–6
–5 –4 –3 –2 –1 0 1 2 3 4
niid quantiles

0
DF: τβ
t(3) quantiles

–2

–4

–6 slight departures at extreme quantiles

T = 250
–8
–6 –5 –4 –3 –2 –1 0 1 2
niid quantiles

Figure 2.3 QQ plots of DF τ̂ tests, niid vs. t(3)

F , the 1% and 5% sized tests are slightly oversized, and

ii) In the case of RUR
there is some departure from the 45◦ degree line in the QQ plot of Figure
2.4 (upper panel), but this is conﬁned to the upper quantiles.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 57 — #30

58 Unit Root Tests in Time Series Volume 2

RFur: standard test

t(3) quantiles

4
departure at
upper quantiles
2
T = 250

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

6
t(3) quantiles

RFB
ur : forward–backward test

2 T = 250
perfect alignment
0
1 2 3 4 5 6 7 8
niid quantiles

Figure 2.4 QQ plots of range unit-root tests, niid vs. t(3)

FB maintains its size quite generally throughout the range (see Figure 2.4,
iii) RUR
lower panel).
iv) Overall, although the differences among the test statistics are not marked,
FB is better at maintaining size compared to R F .
RUR UR

Further, reference to AES (op. cit., table III) indicates that there is a power gain
in using RURF , compared to a DF test, for a number of fat-tailed or asymmetric

distributions, for near-unit root alternatives, for example for ρ = 0.95 to 0.99
when T = 250.
Another practical issue related to robustness concerns the characteristics of
the RUR tests when the sequence of zt does not comprise iid random variables
(that is zt = εt . AES (op. cit.) report simulation results for size and power when
zt is generated by an MA(1) process: zt = εt + θ1 εt , with θ1 ∈ (0.5, –0.8). AES ﬁnd
that, with a sample size of T = 100 and a 5% nominal size, both RUR F and R FB
UR
are (substantially) under-sized for θ1 = 0.5, for example, an actual size of 0.4%
F and 0.6% for R FB . In the case of negative values of θ , which is often
for RUR ur 1
the case of greater interest, RUR F and R FB are over-sized, which is the usual effect
UR
on unit root tests in this case, but RUR FB performs better than R F and is only
UR
moderately over-sized; for example, when θ1 = −0.6, the empirical sizes of RUR F

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 58 — #31

Functional Form and Nonparametric Tests 59

and RURFB are 20% and 9%, respectively. However, the DF t-type test, with lag

selection by MAIC, due to Ng and Perron (2001) (see UR Vol. 1, chapter 9 for
F and R FB in terms of size ﬁdelity.
details of MAIC) outperforms both RUR UR

2.4.4 The range unit root tests for trending alternatives

Many economic time series are trended, therefore to be of practical use a test
based on the range must be able to distinguish a trend arising from a drifted
random walk and a trend arising from a trend stationary process. This section

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
describes the procedure suggested by AES (op. cit.) for such situations.
The stochastically trended and deterministically trended alternatives are,
resepectively:

yt = β1 + yt−1 + εt β1 = 0 (2.46)
(yt − β0 − β1 t) = zt zt ∼ I(0) (2.47)

In the first case, yt ∼ I(1) with drift, which generates a direction or trend to
the random walk, whereas in the second case, yt ∼ I(0) about a deterministic
F and R FB are
trend. The key result as far as the RUR test is concerned is that RUR UR
divergent under both processes, so that:
F
RUR = Op (T1/2 ) → ∞ as T → ∞
F is now
under both (2.46) and (2.47). Thus, the right tail of the distribution of RUR
relevant for hypothesis testing, where the null hypothesis is of a driftless random
walk against a trended alternative of the form (2.46) or (2.47); for appropriate
upper quantiles, see Table 2.10 for RUR F and Table 2.11 for R FB .
UR
F
As the divergence of RUR follows for both the I(1) and trended I(0) alternative,
it is necessary to distinguish these cases by a further test. The following proce-
dure was suggested by AES (op. cit.). The basis of the supplementary test is that
if yt ∼ I(1) then yt ∼ I(0), whereas if yt ∼ I(0) then yt ∼ I(–1), so the test rests
on distinguishing the order of integration of yt .
If a variable xt is I(0), then its infinite lagged sum is I(1), so summation raises
the order of integration by one each time the operator is applied. The same
result holds if a series is defined as the infinite lagged sum of the even-order
(k)
lags. The sequence x̃t is defined as follows:
(k)
∞ (k−1)
x̃t ≡ Lj xt−j (2.48)
j=0

(0)
where the identity operator is x̃t ≡ xt .
Some of the possibilities of interest for determining the integration order are:
(1)
A: if xt ∼ I(0), then x̃t ≡ ∞ j=0 L xt−j ∼ I(1).
j
(1)
B.i: if xt ∼ I(–1), then x̃t ≡ ∞ j=0 L xt−j ∼ I(0).
j
(1) (2) ∞ j (1)
B.ii: if x̃t ∼ I(0), then x̃t ≡ j=0 L x̃t−j ∼ I(1).

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 59 — #32

60 Unit Root Tests in Time Series Volume 2

These results may now be applied in the context that xt = yt , giving rise to
two possibilities to enable a distinction between a drifted random walk from a
trend stationary process:

Possibility A:
(1) F (or R FB ) with the variable x̃(1) , should
If xt ∼ I(0), then x̃t ∼ I(1), and using RUR UR t
result in non-rejection of the null hypothesis.

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Possibility B:
(1) F (or R FB ) with the variable x̃(1) ,
If xt ∼ I(–1), then x̃t ∼ I(0), and using RUR UR t
should result in rejection of the null hypothesis.
(2) F (or R FB ) with the variable x̃(2) ,
If xt ∼ I(–1), then x̃t ∼ I(1), and using RUR UR t
should result in non-rejection of the null hypothesis.
Hence, given that in the case of the trending alternative, a rejection of the null
hypothesis has first occurred using the upper quantiles of the null distribution,
then the decision is to conclude in favour of a drifted random walk if possibility
A results, but conclude in favour of a trend stationary process if possibility B
results. Since the overall procedure will involve not less than two tests, each at,
say, an α% significance level, the overall significance level will (generally) exceed
α%; for example in the case of two tests, the upper limit on the size of the tests
is α = (1 − (1 − α)2 ). Note that the infinite sums implied in the computation of
(j)
x̃t have to be truncated to start at the beginning of the sample. An example of
the use of these tests is provided in Section 2.6.3.ii as part of a comparison of a
number of tests in Section 2.6.

2.5 Variance ratio tests

Another basis for unit root tests that uses less than a full parametric structure
is the different behaviour of the variance of the series under the I(1) and I(0)
possibilities. Variance ratio tests are based on the characteristic of the partial sum
(unit root) process with iid errors that the variance increases linearly with time.

Consider the simplest case where yt = y0 + ti=1 εi , with εt ∼ iid(0, σε2 ) and, for
simplicity, assume that y0 is nonstochastic, then var(yt ) = tσε2 . Further, note that
1 yt ≡ yt − yt−1 = εt , where the subscript on 1 emphasises that this is the ﬁrst
difference operator and higher-order differences are indicated by the subscript.
Also var(1 yt ) = σε2 and 2 yt ≡ yt − yt−2 ≡ 1 yt +1 yt−1 ; hence, by the iid
assumption, var(2 yt ) = 2σε2 . Generalising in an obvious way, var(q yt ) = qσε2 ,
q−1
where q yt ≡ yt − yt−q ≡ i=0 1 yt−i . In words, the variance of the q-th order
difference is q times the variance of the ﬁrst-order difference.
Moreover, this result generalises to heteroscedastic variances provided {εt }
remains a serially uncorrelated sequence. For example, introduce a time sub-
2 , to indicate heteroscedasticity, then var( y ) = q−1 σ2
script on σε2 , say σε,t q t i=0 ε,t−i ,

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 60 — #33

Functional Form and Nonparametric Tests 61

so that the variance of the q-th order difference is the sum of the variances of
the q ﬁrst-order differences.
For simplicity consider the homoscedastic case, then a test of whether yt has
been generated by the partial sum process with iid errors, can be based on a
comparison of the ratio of (1/q) times the variance of q yt to the variance of
1 yt . This variance ratio is:
1 var(q yt )
VR(q) ≡ (2.49)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
q var(1 yt )
The quantity VR(q) can be calculated for different values of q and, in each case, it

will be unity under the null hypothesis, H0 : yt = y0 + ti=1 εi , with εt ∼ iid(0, σε2 ).
Hence, the null incorporates, or is conditional on, a unit root, with the focus on
the lack of serial correlation of the increments, εt . The nature of the alternative
hypothesis, therefore, requires some consideration; for example, rejection of H0
that VR(q) = 1 could occur because the εt are serially correlated, which is not
a rejection of the unit root in H0 ; however, rejection could also occur if there
is not a unit root. Values of VR(q) that are different from unity will indicate
rejection of the null hypothesis of a serially uncorrelated random walk.
This procedure generalises if there are deterministic components in the gen-
eration of yt . For example, assume that (yt − μt ) = ut , where E(ut ) = 0 and μt
includes deterministic components, typically a constant and/or a (linear) trend;
and, as usual, the detrended series is ỹt = yt − μ̂t , where μ̂t is an estimate of the
trend, then the variance ratio is based on ỹt rather than the original series, yt .

2.5.1 A basic variance ratio test

Evidently a test can be based on constructing an empirical counterpart for
VR(q) and determining its (asymptotic) distribution. First, deﬁne an estimator
of VR(q), using the sample quantities as follows:
1 T
1 T t=q+1 (q yt − qβ̂1 )
2
OVR(q) =
1 T
(2.50)
2
t=1 (1 yt − β̂1 )
q
T

The notation reﬂects the overlapping nature of the data uses in this estimator –
this is discussed further below. The estimator OVR(q) allows for drift in the null
of a random walk as the variable in the numerator is the q-th difference, q yt ,
minus an estimator of its mean, qβ̂1 . An alternative notation is sometimes used:
q−1
let xt ≡ 1 yt , then q yt ≡ j=0 xt−j and this term is used in OVR(q). Also, the
quantity in (2.50) is sometimes deﬁned using a divisor that makes a correction
for degrees of freedom in estimating the variance; see Lo and MacKinlay (1989).
A test based on OVR(q) uses overlapping data in the sense that successive
elements have a common term or terms; for example, for q = 2, 2 yt =
1 yt + 1 yt−1 and 2 yt−1 = 1 yt−1 + 1 yt−2 , so the ‘overlapping’ element is
1 yt−1 . To put this in context, suppose 1 yt are daily returns on a stock price,

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 61 — #34

62 Unit Root Tests in Time Series Volume 2

then assuming five working days, a weekly return can be defined as the sum of

five consecutive daily returns; thus, for example, 5 yt = 4j=0 1 yt−j , with suc-

cessive element 5 yt+1 = 4j=0 1 yt+1−j , so that the overlapping elements are
3
j=0 1 yt−j . This differs from the weekly return defined as the sum of the five
daily returns in each week, the time series for which would be non-overlapping.
A non-overlapping version of VR(q), NOVR, is also possible, although the over-
lapping version is likely to have higher power; see Lo and MacKinlay (1989) and

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
also Tian, Zhang and Huan (1999) for the exact ﬁnite-sample distribution of the
NOVR version.
Lo and MacKinlay (1988) showed that:

√
2q(q − 1)(q − 1)
T(OVR − 1) ⇒D N 0, (2.51)
3q

The asymptotics here are large sample asymptotics, that is as T → ∞, with

q ﬁxed; thus, (q/T) → 0 as T → ∞. On this basis, the following quantity has a
limiting null distribution that is standard normal:

√ 1/2
3q
T(OVR − 1) ⇒D N(0, 1) (2.52)
2q(q − 1)(q − 1)

Remember that the null hypothesis is of a process generated by a unit root

combined with uncorrelated increments. The test is two-sided in the usual sense
that large positive or large negative values of the left-hand side of (2.52) relative
to N(0, 1) lead to rejection in favour of the random walk alternative.
There are four points to note about the use of the variance ratio test statistic
(2.52).
The ﬁrst point is that the result in (2.52) is asymptotic and the test statistic
is biased downward in small samples. The statistic under-rejects the null in the
lower tail, with the null distribution skewed to the right. There have been several
suggestions to remove or ameliorate this bias. First, rather than use the N(0, 1)
distribution, Lo and MacKinlay (1989) provided some simulated critical values
for T and q. Second, Tse, Ng and Zhang (2004) suggested a revised version of
the OVR test statistic (the circulant OVR), derived its expectation, which is not
equal to unity in small samples, and then adjusted their test statistic so that it
does have a unit expectation. Whilst Tse et al. (2004) consider the expectation
of the ratio, Bod et al. (2002) derived adjustment factors for the numerator and
denominator of the variance ratio in its original OVR form; although the expec-
tation of a ratio of random variables is not the ratio of expectations, for practical
purposes the suggested adjustment delivers an unbiased version of OVR(q).
Finally, a small sample adjustment can be based on noting, as in UR, Vol.1,
chapter 3, that if a scalar parameter ψ is estimated with bias, such that E(ψ̂) =
ψ + b(ψ), then an unbiased estimator results after subtraction of the bias from

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 62 — #35

Functional Form and Nonparametric Tests 63

the estimator; thus E(ψ̂ − b(ψ)) = ψ. To this end, OVR(q) can be written as a
function of the sample autocorrelations ρ̂(q) (see Cochrane (1988)), that is:
q−1
j
OVR(q) = 1 + 2 1− ρ̂(j) (2.53)
q
j=1

The expected value of ρ̂(j) for a serially uncorrelated time series, ignoring terms
of higher order than O( T−1 ), is given by Kendall (1954):

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
1
E(ρ̂(j)) = − (2.54)
T−j
This expectation should be zero for an unbiased estimator, hence (2.54) is also
the first-order bias of ρ̂(j). The bias is negative in finite sample, but disappears
as T → ∞, for fixed j. Thus, ρ̂(j) + (T − j)−1 , is a first order-unbiased estimator of
ρ̂(j) under the null of serially uncorrelated increments.
Noting that as E(OVR(q)) is a linear function of E{ρ̂(j)}, then the bias in
estimating OVR(q) by (2.50) is:

B{OVR(q)} = E(OVR(q)) − 1 (2.55)

where:
q−1
2 q−j
B(OVR(q)) = − (2.56)
q T−j
j=1

See also Shiveley (2002). An estimator that is unbiased to order O(T−1 ) under the
null is obtained as ROVR(q) = OVR(q) – B(OVR(q)). However, the bias in (2.54)
is only relevant under the null, so that a more general approach would correct
for bias in estimating ρ(j) whether under the null or the alternative hypotheses.
The second point is that if q is large relative to T, then the standard normal
distribution does not approximate the ﬁnite sample distribution very well, with
consequent problems for size and power; see Richardson and Stock (1989) and
Deo and Richardson (2003). An alternative asymptotic analysis with q → ∞
as T → ∞, such that (q/T) → λ, provides a better indication of the relevant
distribution from which to obtain quantiles for testing H0 . Deo and Richardson
(2003) showed that under this alternative framework, OVR(q) converges to the
following non-normal limiting distribution:

1 1
OVR(q) ⇒ (B(s) − B(s − λ) − λB(1))2 ds (2.57)
(1 − λ)2 λ λ

for which critical values can be obtained by simulation; see Deo and Richardson
(2003).
The third point is that variance ratio tests are often carried out for different
values of q; this is because under the null the variance ratio is unity for all

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 63 — #36

64 Unit Root Tests in Time Series Volume 2

values of q. However, such a procedure involves multiple testing with implica-

tions for the cumulative (or overall) type-I error. For example, suppose n values
of q are chosen and the null is not rejected if and only if none of the n variance
ratio test statistics are not rejected at the α% level, then this multiple compari-
son implies an overall signiﬁcance level that (generally) exceeds the individual
signiﬁcance level. Assuming independence of tests the overall type-I error is
α = 1−(1−α)n , but this reduces for dependent tests (as is the case here); further,
Fong, Koh and Ouliaris (1997) show that this can be improved for joint appli-

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
cations of the variance ratio test and it is, therefore, possible to control for the
overall size.
Fourth, the random walk hypothesis is often of interest in situations where
homoscedasticity is unlikely; for example, there is a considerable body of evi-
dence that ﬁnancial time series are characterised by heteroscedasticity, including
ARCH/GARCH-type conditional heteroscedasticity. The variance ratio test can
be generalised for this and some other forms of heteroscedasticity, which again
results in a test statistic that is distributed as standard normal under the null
hypothesis; see Lo and MacKinlay (1989).
A problem with variance ratio-based tests is that they are difﬁcult to generalise
to non-iid errors; however, Breitung (2002) has suggested a test on a variance
ratio principle, outlined in the next section, that is invariant to the short-run
dynamics and is robust to a number of departures from normally distributed
innovations.

2.5.2 Breitung variance ratio test

A test statistic for the null hypothesis that yt is I(0), and in particular that it is
white noise with zero mean, against the alternative that yt is I(1), based on the
variance ratio principle, was suggested by Tanaka (1990) and Kwiakowski et al.
(1992) and is given by:

T
Yt2 /T2
νK = t=1
T
(2.58)
t=1 yt /T
2

where Yt ≡ ti=1 yi ; see also UR, Vol. 1, chapter 11. The denominator is an
estimator of the long-run variance assuming no serial correlation; if this is not
the case, it is replaced by an estimator, either a semi-parametric estimator, with
a kernel function, such as the Newey-West estimator, or a parametric version
that uses a ‘long’ autoregression (see UR, Vol. 1, chapter 6).
The statistic νK is the ratio of the variance of the partial sum to the variance
of yt normalised by T. If the null hypothesis is of stationarity about a non-zero
constant or linear trend, then the KPSS test statistic is deﬁned in terms of ỹt , so

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 64 — #37

Functional Form and Nonparametric Tests 65

that the test statistic is:

T
Ỹt2 /T2
νrat = t=1
T
t=1 ỹt /T
2

T
Ỹt2
= T−1 t=1
T
(2.59)
2
t=1 ỹt

where ỹt = (yt − μ̂t ), μ̂t is an estimator of the deterministic components of yt

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19

and Ỹt is the partial sum of ỹt , that is Ỹt ≡ ti=1 ỹi .
When using νrat as a test of stationarity, the critical values come from the right
tail of the null distribution of νrat . However, Breitung (2002) suggests using νrat as
a unit root test, that is of the null hypothesis that yt is I(1) against the alternative
that yt is I(0), so that the appropriate critical values for testing now come from
the left tail of the null distribution.
Breitung (2002, Propostion 3) shows that νrat has the advantage that its asymp-
totic null distribution does not depend on the nuisance parameters, in this case
those governing the short-run dynamics. The limiting null distribution when
there are no deterministic components is:
1 r 2
0 0 B(s) dr
νrat ⇒ 1 (2.60)
2
0 B(r) dr

whereas when yt is adjusted for deterministic components, this becomes:

1 r 2
0 0 B(s)j dr
νrat,j ⇒ 1 (2.61)
2
0 B(r)j dr

B(s)j is Brownian motion adjusted for deterministic components, with j = μ, β

(see Breitung 2002, Proposition 3), deﬁned as follows:

1
B(r)μ = B(r) − B(s)ds (2.62)
0
1 1
B(r)β = B(r) + (6r − 4) B(s)ds − (12r − 6) sB(s)ds (2.63)
0 0

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 65 — #38

66 Unit Root Tests in Time Series Volume 2

Table 2.13 Critical values of νrat,j

νrat νrat,μ νrat,β

1% 5% 10% 1% 5% 10% 1% 5% 10%

100 0.0313 0.0215 0.0109 0.0144 0.0100 0.0055 0.0044 0.0034 0.0021
250 0.0293 0.0199 0.0097 0.0143 0.0100 0.0056 0.0044 0.0034 0.0022
500 0.0292 0.0199 0.0099 0.0147 0.0105 0.0054 0.0044 0.0036 0.0026

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Source: Breitung (2002, table 5); a sample value less than the α% critical value leads to rejection of H0 .

The models for which νrat,j is relevant are of the following familiar form:

ỹt = ρỹt−1 + zt
zt = ω(L)εt
ỹt = yt − μt

J J
where ω(L) = 1 + j=1 ωj Lj , with εt ∼ iid(0, σε2 ), σε2 < ∞, j=1 j2 ω2j < ∞, and J
may be inﬁnite as in the case that the generation process of zt includes an AR
component. Error processes other than the standard linear one are allowed; for
example, zt could be generated by a fractional integrated process, or a non-
linear process, provided that they admit a similar Beveridge-Nelson type of
decomposition; see Breitung (2002) for details.
Breitung (2002) reports some simulation results for the problematic case with
an MA(1) error generating process, so that yt = zt , with zt = (1 + θ1 L)εt and
T = 200, where the test is based alternatively on the residuals from a regression
of yt on a constant or a constant and a trend. The results indicate a moderate
size bias for νrat,j , j = μ, β, when θ1 = −0.5, which increases when θ1 = −0.8;
whether this is better than the ADF for τ̂j , depends on the lag length used in the
latter, with ADF(4) suffering a worse size bias, but ADF(12) being better on size
but worse on power.

2.6 Comparison and illustrations of the tests

In this section some consideration is given to a comparison of the test statistics

described in the previous sections in terms of empirical size and power. Whilst
this is necessarily limited (for example, the robustness of the tests to MA(1)
errors is considered, but not to other forms of weak dependency) the comparison
brings the assessments in the individual studies onto a comparable basis. Two
empirical illustrations of the tests are given in Section 2.6.2.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 66 — #39

Functional Form and Nonparametric Tests 67

2.6.1 Comparison of size and power

The test statistics to be considered are ﬁrst, by way of benchmarking the results,
the standard DF test statistics δ̂μ and τ̂μ , and then their rank-based counter-
parts δ̂r,μ and τ̂r,μ , and then the range unit root tests in their forward and
forward–backward versions, RUR F and R FB , and the variance ratio test, ν
ur rat,μ ; the
rank based score tests λr,β and λlr,β are also included. The simulations consider
the robustness of the tests to MA(1) errors and to the form of the innovation

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
distribution.
The simulation results are for the case where the DGP is: yt = ρyt−1 + zt , zt =
(1 + θ1 L)εt , ρ = (1, 0.95, 0.9, 0.85), θ1 = (0, –0.5) and T = 200; size is evaluated
when ρ = 1 for different values of θ1 , whereas power is evaluated for ρ < 1. The
nominal size of the tests is 5%, and the alternative innovation distributions are
N(0, 1), t(3) and Cauchy. The rank score-based test λlr,β requires estimation of
the long-run variance when θ1 = −0.5 (see Section 2.3.2, especially Equation
(2.40)). A truncation parameter of M = 8 was used, as in BG (op. cit.) and, for
comparability, a lag of 8 was used for the DF tests.
The results are summarised in Table 2.14 in two parts. The first set of results
is for θ1 = 0 and the second set for θ1 = –0.5. In the case of ρ < 1, both ‘raw’ and
size-adjusted power are reported, the latter is indicated by SA.
The first case to be considered is when the errors are normal and there is no
MA component, so there is no distinction between errors and innovations. Size
is generally maintained at its nominal level throughout. The standard DF tests
might be expected to dominate in terms of power; however, this advantage is not
uniform and the rank-based DF tests are comparable despite the fact that they
only use part of the information in the sample; for example, the rank-based
t-type test is more powerful than its parametric counterpart and is the most
powerful of the nonparametric tests considered here. An explanation for this
finding is that the mean of the ranks, r̄, is known, whereas in the parametric test
the mean has to be estimated, which uses up a degree of freedom; see Granger
and Hallman (1991) for further details. Of the other nonparametric tests RUR FB is

in second place. In the power evaluation, the rank-based score tests λr,β and λlr,β
are likely to be at a disadvantage because in this simulation set-up, where H0 is
a random walk without drift, they over-parameterise the alternative, and this is
conﬁrmed in the simulation results. Their relative performance would improve
against δ̂r,β and τ̂r,β .
In the case of the innovations drawn from t(3), there are some slight size
distortions. Otherwise the ranking in terms of power is generally as in the normal
innovations case; of note is that νrat,μ gains somewhat in power whereas λr,μ
loses power. The Cauchy distribution is symmetric, but has no moments, and
may be considered an extreme case: in this case the DF rank-based tests δ̂r,μ
and τ̂r,μ become under-sized, whereas RUR F is over-sized and apart from R F ,
UR

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 67 — #40

68 Unit Root Tests in Time Series Volume 2

Table 2.14 Summary of size and power for rank, range and variance ratio tests

θ1 = 0

εt ∼ Normal δ̂μ τ̂μ δ̂r,μ τ̂r,μ F

RUR FB
RUR λr,β νrat,μ
ρ=1 4.84 5.08 5.32 5.20 4.90 4.28 5.18 5.44
ρ = 0.95 18.68 14.20 19.62 18.26 15.34 16.84 11.22 15.62
ρ = 0.90 52.30 40.10 47.56 44.94 28.66 35.16 27.32 30.34
ρ = 0.85 84.74 73.52 78.18 75.32 42.96 55.42 46.28 43.28

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
εt ∼ t(3) δ̂μ τ̂μ δ̂r,μ τ̂r,μ F
RUR FB
RUR λr,β νrat,μ
ρ=1 4.10 5.20 3.90 3.85 5.95 3.60 4.95 4.85
ρ = 0.95 22.35 12.90 19.45 19.50 17.75 20.85 10.25 18.10
ρ = 0.90 56.30 37.50 47.20 45.15 31.55 32.90 16.85 32.95
ρ = 0.85 90.10 74.55 77.40 76.15 44.45 54.20 24.70 47.45

εt ∼ Cauchy δ̂μ τ̂μ δ̂r,μ τ̂r,μ F

RUR FB
RUR λr,β νrat,μ
ρ=1 5.68 7.34 1.58 1.68 11.30 4.86 5.52 4.42
ρ = 0.95 8.62 7.12 12.34 12.08 37.74 7.04 0.38 12.32
ρ = 0.90 34.92 14.40 30.78 27.94 43.94 8.64 0.26 27.20
ρ = 0.85 89.40 38.42 56.64 52.88 47.90 12.64 0.22 44.34

θ1 = −0.5
εt ∼ Normal δ̂μ τ̂μ δ̂r,μ τ̂r,μ F
RUR FB
RUR λlr,β νrat,μ
ρ=1 9.7 4.2 6.9 1.5 26.6 33.4 53.5 6.5
ρ = 0.95 59.8 24.8 48.2 16.9 76.2 91.8 86.4 40.1
SA 43.1 28.5 39.9 38.1 19.4 36.0 14.1 35.4
ρ = 0.90 93.3 61.1 84.5 51.2 92.0 99.5 93.8 69.1
SA 82.6 66.3 78.0 76.9 38.6 67.2 34.0 64.5
ρ = 0.85 99.1 84.7 95.7 78.0 95.8 1.0 93.6 85.8
SA 96.0 88.5 93.2 93.0 51.2 81.4 41.4 83.2

εt ∼ t(3) δ̂μ τ̂μ δ̂r,μ τ̂r,μ F

RUR FB
RUR λlr,β νrat,μ
ρ=1 9.9 4.0 7.7 1.0 57.7 45.9 62.2 7.7
ρ = 0.95 59.2 19.8 44.0 14.6 92.0 91.5 70.3 36.4
SA 41.8 24.2 38.2 34.7 8.7 18.2 8.7 27.2
ρ = 0.90 91.5 60.1 80.6 47.6 96.3 99.1 77.1 69.9
SA 82.5 68.2 75.3 72.0 11.3 34.1 11.9 58.0
ρ = 0.85 98.3 84.0 94.4 74.4 97.1 99.9 77.3 85.4
SA 95.9 88.0 92.1 91.1 14.7 44.5 20.2 77.8

εt ∼ Cauchy δ̂μ τ̂μ δ̂r,μ τ̂r,μ F

RUR FB
RUR λlr,β νrat,μ
ρ=1 9.1 7.1 6.0 1.0 97.4 57.0 67.3 6.2
ρ = 0.95 55.1 16.6 26.6 7.5 97.8 39.7 5.3 32.6
SA 24.5 11.7 24.5 24.5 1.0 1.7 0.0 28.5
ρ = 0.90 95.9 64.2 58.1 24.5 97.6 54.6 6.7 58.1
SA 87.5 37.2 56.1 55.1 1.0 3.0 0.0 55.7
ρ = 0.85 98.2 90.3 83.3 48.2 97.3 69.9 8.1 86.8
SA 95.7 81.3 80.9 79.4 0.3 4.9 0.0 76.8

Notes: T = 200; the ﬁrst row of each entry is the ‘raw’ size or power, and the second row is
the size-adjusted power when ρ < 1; best performers in bold.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 68 — #41

Functional Form and Nonparametric Tests 69

power is clearly lost throughout. Overall, δ̂μ remains the best test; RUR F has size-

adjusted power that is better than δ̂μ for part of the parameter space, but since
it is over-sized the comparison is misleading; however, a bootstrapped version
that potentially corrects the incorrect size may have something to offer, but is
not pursued here.
A more problematic case for unit root testing is when there is an MA com-
ponent to the serial correlation, for example, in the ﬁrst-order case when the
MA coefﬁcient is negative, and θ1 = –0.5 is taken here as illustrative. This case

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
is known to cause difficulties to the parametric tests, for which some protection
is offered by extending the lag in an ADF model. A priori this may offer some
relative advantage to νrat , which is invariant to the serial correlation.
The first question, when ρ = 1 and θ1 = –0.5, relates to the retention of size,
where the nominal size is 5% in these simulations. In the case of normal inno-
vations, the best of the nonparametric tests is now νrat,μ , with RUR F and R FB
UR
suffering size distortions. The size distortion of λlr,μ is consistent with the results
reported in BG (op. cit.). Of the ADF rank tests, δ̂r,μ is slightly over-sized, whereas
τ̂r,μ is under-sized. The lag length is long enough for τ̂μ to maintain its size rea-
sonably well at 4.2%, but as observed in UR, Vol. 1, chapter 6, δ̂μ is not as robust
as τ̂μ . The comparisons of power will now only make sense for the tests that
do not suffer (gross) size distortions. Both ‘raw’ and size-adjusted (SA) power is
given for the cases where ρ < 1. In this case, it is now the simple nonparametric
test νrat,μ that has the advantage over the other tests.
The next question relates to the effect of non-normal innovations on the
performance of the various tests. When the innovations are drawn from t(3),
τ̂μ again maintains its size reasonably well at 4.0%, and δ̂r,μ and νrat,μ are next
best at 7.7%. The most powerful (in terms of SA power) of the tests is δ̂r,μ , with
honours fairly even between τ̂μ and νrat,μ depending on the value of ρ. With
Cauchy innovations, τ̂μ , δ̂r,μ and νrat,μ are comparable in terms of size retention,
whereas δ̂r,μ and νrat,μ are best in terms of SA power.
Overall, there is no single test that is best in all the variations considered
here. The rank-based tests are useful additions to the ‘stock’ of tests, but the test
least affected by the variations considered here is Breitung’s νrat , which shows
good size fidelity and reasonable power throughout, and is particularly simple
to calculate.

2.6.2 Linear or exponential random walks?

The second issue on which the various tests are compared is their sensitivity to
whether the unit root is in the levels or the logarithms of the variable of interest,
as this choice is often the central one in empirical work. The DF rank tests and
the RUR tests are invariant to the transformation, hence this comparison is
concerned with the characteristics of λr,μ and νrat,μ , compared to the standard
DF tests. The set-up is as in Section 2.1, which considered two cases, denoted

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 69 — #42

70 Unit Root Tests in Time Series Volume 2

Table 2.15 Levels or logs, additional tests: score and vari-

ance ratio tests (empirical size)

Case 1 Unit root in yt

ln(yt ) δ̂μ δ̂β τ̂μ τ̂β λr,μ νrat,μ

σ2 = 0.5 5.7% 5.6% 5.3% 4.6% 5.9% 4.7%
σ2 = 1.0 5.4% 5.2% 5.1% 4.3% 6.2% 5.0%

Case 2 Unit root in ln yt

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
yt δ̂μ δ̂β τ̂μ τ̂β λr,μ νrat,μ
σ2 = 0.5 90.3% 82.6% 83.9% 74.5% 6.2% 5.0%
σ2 = 1.0 95.9% 95.8% 94.8% 94.4% 6.4% 4.7%

Notes: T = 200; other details are as in Section 2.1.2 and Table 2.2; 5%
nominal size.

Case 1 where the unit root is in the levels of the series, but the tests are based
on the logs, and Case 2 where the unit root is the logs, but the tests are based
on the levels. The results, reported in Table 2.15, show that, unlike the DF tests,
λr,μ and νrat,μ maintain close to their nominal size of 5% in both cases.

2.6.3 Empirical illustrations

This chapter concludes by illustrating the use of the various tests introduced in
this chapter with two time series.

2.6.3.i Ratio of gold–silver price (revisited)

The ﬁrst illustration revisits the time series of the ratio of the gold to silver price
used in Section 2.2.3. This series has an overall total of 5,372 observations and
is referred to in levels as yt . The unit root tests are reported in Table 2.16. Apart
from RURFB , and possibly R F , the unit root tests point to clear non-rejection of
UR
the null for both yt and ln yt at a 5% signiﬁcance level. (Note that the rank-based
tests are invariant to the log transformation.)
Of the two RUR tests, RUR FB has better characteristics when the errors are serially

correlated (see Section 2.4.3 and AES, op. cit., section 6), and here the sample
value is virtually identical to the 5% critical value. Of course, the use of multiple
tests implies that the overall signiﬁcance level will generally exceed the nominal
signiﬁcance level used for each test.
The usual predicament, anticipated in Section 2.2, therefore, arises in that the
tests are unable to discriminate between whether the unit root is in the ratio or
the log ratio, and a further test is required to assess this issue. This point was
considered in Section 2.2.3, where the KM test suggested that the unit root was
in the ratio (levels) if the comparison is limited to linear integrated or log-linear
integrated processes.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 70 — #43

Functional Form and Nonparametric Tests 71

Table 2.16 Tests for a unit root: ratio of gold to silver prices

δ̂μ τ̂μ δ̂r,μ τ̂r,μ F

RUR FB
RUR λr,μ λlr,μ νrat,μ

yt –12.00 –2.43 –12.74 –2.62 1.23 1.89 0.092 0.160 0.018

ln yt –12.74 –2.53 –12.74 –2.62 1.23 1.89 0.078 0.130 0.016
5% cv –14.02 –2.86 –17.84 –3.03 1.31 1.90 0.036 0.036 0.010

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
x 107
6.5 18

6 levels logs
17.8
5.5

5 17.6

4.5
17.4
4

3.5 17.2

3
17
2.5

2 16.8
1980 1985 1990 1995 2000 1980 1985 1990 1995 2000

Figure 2.5 US air passenger miles

2.6.3.ii Air revenue passenger miles (US)

The second illustration uses a smaller sample and a series with a trend. The data
are for air revenue passenger miles in the US (one revenue passenger transported
one mile; source: US Bureau of Transportation Studies; seasonally adjusted data).
The time series are graphed in Figure 2.5 as levels and logs respectively. These
data are monthly and the overall sample period is January 1980 to April 2002;
however, there is an obvious break in the series in September 2001 (9/11) and
the estimation sample ends in August 2001. Comparing the left-hand and right-
hand panels of Figure 2.5, it is evident that the levels and log data share the same
general characteristics making it quite difﬁcult to assess whether, if there is a unit
root, it is in the levels or the logs of the series.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 71 — #44

72 Unit Root Tests in Time Series Volume 2

Table 2.17 Regression details: air passenger miles, US

C yt−1 yt−1 yt−2 yt−3 yt−4 yt−5 Time

yt 2.247 –0.112 –0.039 –0.214 –0.161 –0.49 –0.153 0.017

(‘t’) (3.14) (–2.85) (–0.57) (–3.20) (–2.42) (–0.77) (–2.40) (2.91)
yt 3.158 –0.165 0.026
(‘t’) (4.86) (–4.73) (4.74)
C ln yt−1 ln yt−1 ln yt−2 ln yt−3 ln yt−4 ln yt−5 Time

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ln yt 0.240 –0.017 –0.090 –0.237 –0.162 0.0003
(‘t’) (–0.45) (–1.42) (–3.81) (–2.54) (1.98)
ln yt 0.294 –0.020 –0.113 –0.282 –0.213 –0.087 –0.185 –0.0004
(‘t’) (1.77) (–0.53) (–1.77) (–4.42) (–3.28) (–1.36) (–2.90) (0.75)

Table 2.18 Tests for a unit root: air passenger miles, US

δ̂β τ̂β δ̂r,β τ̂r,β F

RUR FB
RUR λr,μ λlr,μ νrat,β

yt –20.79 –2.85 –14.77 –2.77 5.47 7.38 0.083 0.140 0.049

ln yt –3.83 –0.45 –14.77 –2.77 5.47 7.38 0.051 0.622 0.014
5% –21.23 –3.43 –24.55 –3.60 1.31 1.90 0.036 0.036 0.0034
95% 3.29 4.65

Notes: a β subscript indicates that the test statistic is based on a linear trend adjusted series; appropriate
critical values were simulated for the test statistics, based on 50,000 replications; Breitung’s λr,μ statistic
allows for drift, if present, under the null.

The parametric structure for the ADF tests is now somewhat more difﬁcult
to pin down. In the case of the levels data, yt , both marginal-t and AIC select
ADF(5), with τ̂β = –2.853, which is greater than the 5% cv of –3.4, which leads
to non-rejection of the null; however, BIC suggests that no augmentation is
necessary and τ̂β = –4.729, which, in contrast, leads to rejection of the null
hypothesis. When the log of yt is used, BIC selects ADF(3), with τ̂β = –0.45,
which strongly suggests, non-rejection; AIC and marginal-t both select ADF(5),
with τ̂β = –0.53, conﬁrming this conclusion. At this stage, based on these tests,
it appears that if there is a unit root then it is in the logarithm of the series; see
Table 2.17 for a summary of the regression results and Table 2.18 for the unit
root test statistics.
The next step is to compare the ADF test results with those for the nonpara-
metric tests. The rank ADF tests now use (linearly) detrended data, and both
δ̂r,β = −14.77 and τ̂r,β = −2.77 suggest non-rejection of the null. The RUR tests
now compare the sample value with the 95% quantile (that is, they become
right-sided tests), and both lead to rejection of a driftless random walk in favour
of a trended alternative. Also the rank-score test, λlr,μ , leads to non-rejection, as
does the variance ratio test νrat,β .

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 72 — #45

Functional Form and Nonparametric Tests 73

Table 2.19 KM tests for linearity or log-linearity: air passenger miles, US

KM tests

Null, linear; V1 = 2.84 (5 lags) 95% quantile = 1.645

alternative, loglinear V1 = 2.83 (3 lags)
Null, log-linear; V2 = −3.18 (5 lags) 95% quantile = 1.645
alternative, linear V2 = −3.14 (6 lags)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
There are two remaining issues. First, for the RUR tests, the question is whether
the rejection arises from a drifted random walk or a trend stationary process.
To assess this issue, the auxiliary procedure suggested in Section 2.4.4 is the
next step. This procedure involves a second application of the RUR test, but
this time on a variable that is based on a sum of the variable in the first test,
(1) F and
referred to in Section 2.4.4 as x̃t . The resulting sample values of the RUR
FB test statistics applied to x̃ (1)
Rur t are now 2.98 and 4.30 respectively; these values
strongly indicate non-rejection of the null hypothesis of a unit root as the 5%
quantiles are (approximately) 1.31 and 1.90 respectively; moreover, there is now
no rejection in the upper tail, as the 95% quantiles are (approximately) 3.29 and
4.65 respectively. In terms of the decisions outlined in Section 2.4.4, we find
in favour of possibility A, that is, the drifted integrated process rather than the
trend stationary process.
Second, non-rejection does not indicate the transformation of the series that
is appropriate. If the choice is limited to either a linear integrated or a log-linear
integrated process, the KM tests are relevant. In this case, the tests referred to
as V1 and V2 , which allow for a drifted integrated process, are reported in Table
2.19. The first of these leads to rejection of the null of linearity, whereas the
second suggests non-rejection of the null of log-linearity.

2.7 Concluding remarks

A practical problem for empirical research often relates to the precise choice
of the form of the variables to be used. This can be a key issue, even though
quite frequently there is no clear rationale for a particular choice; a leading
case, but of course by no means the only one, is whether a time series should
be modelled in levels (the ‘raw’ data) or transformed into logarithms. There
are very different implications for choosing a linear integrated process over a
log-linear integrated process, since the former relates to behaviour of a random
walk type whereas the other relates to an exponential random walk. A two-stage
procedure was suggested in this chapter as a way of addressing this problem. In
the ﬁrst instance, it is necessary to establish that the null hypothesis of a unit
root is not rejected for some transformation of the data; this stage may conclude

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 73 — #46

74 Unit Root Tests in Time Series Volume 2

that there is an integrated process generating the data, and the second stage
decision is then to assess whether this is consistent with a linear or log-linear
integrated process. Of course this just addresses two possible outcomes, linear
or log-linear; others transformations may also be candidates, although these are
the leading ones.
The second strand to this chapter was to extend the principle for the construc-
tion of unit root tests from variants of direct estimation of an AR model, as in
the DF and ADF tests, to include tests based on the rank of an observation in a

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
set and the range of a set of observations. Neither of these principles depended
on a particular parametric structure. Another framework for constructing a unit
root test without a complete parametric structure is based on the behaviour of
the variance of a series with a unit root. Variance-ratio tests are familiar from
the work of Lo and MacKinlay (1988), but the basic test is difﬁcult to extend
for non-iid errors; however, a simple and remarkably robust test due to Breitung
(2002) overcomes this problem.
Other references of interest in this area include Delgado and Velasco (2005),
who suggested a test based on signs that is applicable to testing for a unit root
as well as other forms of nonstationarity. Hasan and Koenker (1997) suggested
a family of rank tests of the unit root hypothesis. Charles and Darné (1999)
provided an overview of variance ratio tests of the random walk. Wright (2000)
devised variance-ratio tests using ranks and signs. Tse, Ng and Zhang (2004) have
developed a non-overlapping version of the standard VR test. Nielsen (2009)
develops a simple nonparametric test that includes Breitung’s (2002) test as a
special case and has higher asymptotic local power.

Questions

Q2.1. Deﬁne a monotonic transformation and give some examples.

A2.1. A monotonic transformation is one for which the derivative is everywhere

positive. For example, consider wt = f(yt ) = ln yt , then ∂wt /∂yt = 1/yt , which is
positive provided that yt > 0, otherwise the transformation is not deﬁned. Thus,
the following transformations, considered by Breitung and Gouriéroux (1997),
1/3
are all monotonic: f(yt ) = yt , f(yt ) = y3t , f(yt ) = tan(yt ).

Q2.2. Consider the ADF version of the rank-based test and explain how to obtain
the equivalent DF-type test statistics.

A2.2. The suggested ADF regression is:

k−1
Rt = γr Rt−1 + cr,j Rt−j + νt (A2.1)
j=1

where Rt are the centred ranks (otherwise a constant is included in the

regression). The pseudo-t test based on the ranks is straightforward and is

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 74 — #47

Functional Form and Nonparametric Tests 75

just:
γ̂r
τ̂R,μ = (A2.2)
σ̂(γ̂r )

where τ̂R,μ is the usual t-type test statistic; the extension of δ̂R,μ to the rank-ADF
case requires a correction, as in the usual ADF case, that is:
(ρ̂r − 1)
δ̂R,μ = T (A2.3)
(1 − ĉr (1))

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19

where ĉr (1) = k−1 j=1 ĉr,j . Note that, without formal justiﬁcation, a detrended
version of this test was used in the application to air revenue passenger miles in
Section 2.6.3.ii.
Q2.3. Develop a variance ratio test based on the signs; see for example, Wright
(2000).
A2.3. The signs deﬁne a simple transformation of the original series. If xt is a
random variable, then the sign function for xt is as follows:

S(xt ) = 2 ∗ I(xt > 0) − 1 (A2.4)

Thus, if xt > 0, then I(xt > 0) = 1 and S(xt ) = 1, and if xt ≤ 0 then I(xt ≤ 0) = 0
and S(xt ) = −1. S(xt ) has zero expectation and unit variance.
Wright (2000) suggested a sign-based version of a variance-ratio test. Assume
that xt has a zero mean, then S(xt ) replaces xt in the deﬁnition of the variance
ratio, to give:
⎛ 2 ⎞
1 T q−1
s −1/2
⎜ Tq t=q+1 j=0 t−j ⎟ 2(2q − 1)(q − 1)
OVRsign = ⎝ − ⎠
1 T
1 (A2.5)
2 3qT
t=1 s
T t

where st−j ≡ S(xt−j ). This form of this test arises because the identity used in the
parametric form of the test does not hold with the sign function, that is q st =
q−1
j=0 st−j .
This test could be applied to the case where xt ≡ yt and the null model
is yt = σt εt , with assumptions that: (i) σt and εt are independent condi-
tional on the information set, It = {yt , yt−1, . . .}, and (ii) E(εt |It−1 ) = 0. The ﬁrst
assumption allows heteroscedasticity, for example of the ARCH/GARCH form
or stochastic volatility type. Note that these assumptions do not require εt to
be iid.
There are variations on these tests. For example, the sign-based test is easily
extended if the null hypothesis is of a random walk with drift, and rank-based
versions of these test can be obtained by replacing the signs by ranks (see
Wright (op. cit.), who simulates the ﬁnite sample distributions and provides
some critical values for different T and q).

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_03_CHA02” — 2012/5/23 — 10:41 — PAGE 75 — #48

3
Fractional Integration

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Introduction

This chapter is the first of two that consider the case of fractional values of the
integration parameter. That is suppose a process generates a time series that
is integrated of order d, I(d), then the techniques in UR, Vol. 1, were wholly
concerned with the case where d is an integer, the minimum number of dif-
ferences necessary, when applied to the original series, to produce a series that
is stationary. What happens if we relax the assumption that d is an integer?
There has been much recent research on this topic, so that the approach in
these two chapters must necessarily be selective. Like so many developments in
the area of time series analysis one again finds influential original contributions
from Granger; two of note in this case are Granger (1980) on the aggregation of
‘micro’ time series into an aggregate time series with fractional I(d) properties,
and Granger and Joyeux (1980) on long-memory time series.
One central property of economic time series is that the effect of a shock is
often persistent. Indeed in the simplest random walk model, with or without
drift, a shock at time t is incorporated or integrated into the level of the series
for all periods thereafter. One can say that the shock is always ‘remembered’ in
the series or, alternatively, that the series has an infinite memory for the shock.
This infinite memory property holds for more general I(1) processes, with AR
or MA components. What will now be of interest is what happens when d is
fractional, for example d ∈ (0, 1.0). There are two general approaches to the
analysis of fractional I(d) process, which may be analysed either in the time
domain or in the frequency domain. This chapter is primarily concerned with
the former, whereas the next chapter is concerned with the latter.
This chapter progresses by first spending some time on the definition of a
fractionally integrated process, where it is shown to be operational through
the binomial expansion of the fractional difference operator. The binomial
expansion of (1 − L)d is an elementary but essential tool of analysis, as is its

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 76 — #1

Fractional Integration 77

representation in terms of the gamma function. Having shown that a meaning

can be assigned to the fractional differencing operator applied to yt , it is impor-
tant to note some of the properties of the resulting process; for example, what
are the implied coefﬁcients in the respective AR and MA representations of the
process? The MA representation is particularly important because it is the usual
tool for analysing the persistence of the process.
For d ∈ (0, 0.5), the I(d) process is said to have ‘long memory’, deﬁned more
formally in the time domain as saying that the sum of the autocorrelations does

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
not converge to a finite constant; in short, they are nonsummable. Of course,
this property also applies to the autocovariances as ρ(k) = γ(k)/γ(0). However,
an I(d) process is still stationary for d ∈ (0, 0.5), only becoming nonstationary
for d ≥ 0.5. Even though the ρ(k) are not summable for d ≥ 0.5, the coefficients
in the MA representation of the process do tend to zero provided d < 1 and
this is described as ‘nonpersistence’. Thus, a process could be nonstationary
but nonpersistent, and care has to be taken in partitioning the parameter space
according to these different properties.
Having established that we can attach a meaning to (1 − L)d yt , a question
to arise rather naturally is whether such processes are likely to occur in an eco-
nomic context. Two justifications are presented here. The first relates to the error
duration model, presented in the form due to Parke (1999), but with a longer his-
tory due, in part to the ‘micropulse’ literature, with important contributions by
Mandelbrot and his co-authors (see, for example, Cioczek-Georges and Mandel-
brot, 1995, 1996, and Mandelbrot and Van Ness, 1968). The other justification
is due to Granger (1980), who presented a model of micro relationships which
were not themselves fractionally integrated, but which on aggregation became
fractionally integrated.
We also consider the estimation of d in the time domain and, particularly,
hypothesis testing. One approach to hypothesis testing is to extend the DF
testing procedure to the fractional d case. This results in an approach that is
relatively easy to apply in a familiar framework.
The range of hypotheses of interest is rather wider than, but includes, the
unit root null hypothesis. In a fractional d context, the alternative to the unit
root null is not a stationary AR process, but a nonstationary fractional d process;
because, in this case, the alternative hypothesis is still one of nonstationarity,
another set-up of interest may be that of a stationary null against a nonstationary
alternative.
The sections in this chapter are arranged as follows. Section 3.1 is concerned
with the definition of a fractionally integrated process and some of its basic
properties. Section 3.2 considers the ARFIMA(p, d, q), where d can be fractional,
which is one of the leading long-memory models. Section 3.3 considers the
kind of models that can generate fractional d. Section 3.4 considers the situa-
tion in which a Dickey-Fuller type test is applied when the process is fractionally

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 77 — #2

78 Unit Root Tests in Time Series Volume 2

integrated, whereas Section 3.5 is concerned with developments of Dickey-Fuller

tests for fractionally integrated processes, and Section 3.6 considers how deter-
ministic components are dealt with in such models. Section 3.7 outlines an LM
test and a regression counterpart which is easy to calculate and has the same
limiting distribution as the LM test. Power is considered in Section 3.8 and an
illustrative example using time series data on US wheat production in included
in Section 3.9. The ﬁnal Section, 3.10, offers some concluding remarks.

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.1 A fractionally integrated process

In the integer differencing case, yt is said to be integrated of order d, I(d), if

the d-th difference of yt is stationary (and d is the minimum number of differ-
ences necessary to achieve stationarity). In a convenient shorthand this is often
written as yt ∼ I(d). Thus, an I(d) process, with integer d, satisﬁes the following
model:

(1 − L)d yt = ut ut ∼ I(0) (3.1)

⇒ yt ∼ I(d)

One could add that (1−L)d−1 yt = vt , vt ∼ I(1), to indicate that d is the minimum
number of differences necessary for the deﬁnition to hold. A special case of (3.1)
is where ut = εt and {εt } is a white noise sequence with zero mean and constant
variance σε2 .
In a sense deﬁned in the next section it is possible to relax the assumption
that d is an integer and interpret (3.1) accordingly either as an I(d) process for
yt or as a unit root process with fractional noise of order I(d − 1).

3.1.1 A unit root process with fractionally integrated noise

To see that the generating equation can be viewed as a unit root process
with fractional noise, rewrite Equation (3.1) separating out the unit root, so
that:

(1 − L)d−1 (1 − L)yt = ut (3.2)

⇒
(1 − L)yt = υt

= (1 − L)−d ut

d ≡ d−1

In this representation yt is generated with a unit root and possibly fractionally

integrated noise υt ∼ FI(
d). If d = 1, then
d = 0, there is a single unit root and

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 78 — #3

Fractional Integration 79

υt = ut ; if d = 2, then d = 1 and (1 − L)υt = ut ; and if d = 0, then d = −1 and

υt = (1 − L)ut . If it is thought that that yt has a near-fractional unit root, with d
close to but less than unity, then this implies that d is close to but less than zero;
for example, if the set d ∈ (½, 1) is likely to be of interest, then that translates to
the set d ∈ (−½, 0) in terms of the noise of the process.
However, so far it has just been asserted that a meaning may be attached
to (1 − L)d if d is not an integer but a number with a fractional part. In order
to make progress on the precise deﬁnition of an I(d) process, it is ﬁrst useful to

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
consider the binomial expansion of the fractional differencing operator (1 − L)d
and, in particular, the AR and MA coefﬁcients associated with this operator.

3.1.2 Binomial expansion of (1 − L)d

3.1.2.i AR coefﬁcients
In the fractional d case, the binomial expansion of (1 − L)d is given by:
∞
(1 − L)d = 1 + Cr (−1)r Lr (3.3)
r=1 d
∞ (d)
= Ar Lr
r=0

where:

(d) (d)
A0 ≡ 1, Ar ≡ (−1)r d Cr (3.4)
(d)r d!
d Cr ≡ = (3.5a)
r! r!(d − r)!

d C0 ≡ 1, 0! ≡ 1
(d)r = d(d − 1)(d − 2) . . . (d − (r − 1))
= d!/(d − r)! (3.5b)

In the integer case, the binomial coefﬁcient d Cr is the number of ways of choos-
ing r from d without regard to order. However, in the fractional case, d is not an
integer and this interpretation is not sustained, although d Cr is deﬁned in the
same way.
Applying the operator (1 − L)d to yt results in:
∞
(1 − L)d yt = yt + Cr (−1)r yt−r (3.6)
r=1 d

= A(d) (L)yt
∞ (d)
A(d) (L) ≡ Ar Lr (3.7)
r=0

A(d) (L) is the AR polynomial associated with (1 − L)d . Using the AR polynomial,
the model of Equation (3.1) with fractional d can be represented as the following

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 79 — #4

80 Unit Root Tests in Time Series Volume 2

inﬁnite autoregression:
∞

r
yt = (−1) d Cr (−1) yt−r + ut (3.8a)
r=1
∞ (d)
= (−1) A yt−r + ut (3.8b)
r=1 r
∞ (d) (d) (d)
= r yt−r + ut where r ≡ −Ar (3.8c)
r=1

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.1.2.ii MA coefﬁcients
Also of interest is the MA representation of Equation (3.1), that is operating on
the left-hand-side of (3.1) with (1 − L)−d , gives:

yt = (1 − L)−d ut (3.9)

This has an inﬁnite moving average representation obtained by the binomial

expansion of (1 − L)−d , that is:
∞
(1 − L)−d = 1 + Cr (−1)r Lr
r=1 −d
∞ (d)
= Br (3.10)
r=0

where:
(d) (d)
B0 = 1, Br ≡ (−1)r −d Cr (3.11)

Operating with (1 − L)−d on ut gives the MA representation of yt :

∞
r r
yt = [1 + −d Cr (−1) L ]ut (3.12)
r=1

= B(d) (L)ut
∞ (d)
B(d) (L) ≡ Br Lr (3.13)
r=0

There are recursions for both the AR and the MA coefﬁcients (for details see
Appendix 3.1 at the end of this chapter) respectively, as follows for r ≥ 1:
(d) (d)
Ar = [(r − d − 1)/r]Ar−1 (3.14)
(d) (d)
Br = [(r + d − 1)/r]Br−1 (3.15)

3.1.2.iii The fractionally integrated model in terms of the

Gamma function
The coefﬁcients of the fractionally integrated model are often written in terms
of the Gamma function. The latter is deﬁned by:
∞
(k) = xk−1 e−x dx k>0 (3.16)
0

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 80 — #5

Fractional Integration 81

The notation k is used for the general case and n for an integer. The Gamma
function interpolates the factorials between integers, so that for integer n ≥ 0
then (n + 1) = n!. The property (n + 1)! = (n + 1)n!, which holds for integer
n ≥ 0, also holds for the Gamma function without the integer restriction; this
is written as (n + 2) = (n + 1)(n + 1).
In the notation of this chapter, the binomial coefﬁcients deﬁned in (3.5) are:

(d)r
d Cr =

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
r!
(d + 1)
= (3.17)
(r + 1)(d − r + 1)

The autoregressive and moving average coefﬁcients are given, respectively, by:

(d) (r − d)
Ar = (3.18)
(r + 1)(−d)
(d) (r + d)
Br = (3.19)
(r + 1)(d)

Using Stirling’s theorem, it can be shown that (r + h)/ (r + 1) = O(rh−1 ) and
applying this for h = −d and h = d, respectively, then as r → ∞, we have:

(d) 1
Ar ∼ r−d−1 AR coefﬁcients (3.20)
(−d)
(d) 1 d−1
Br ∼ r MA coefﬁcients (3.21)
(d)

The AR coefficients will not (eventually) decline unless d > − 1 and the MA
coefficients will not (eventually) decline unless d < 1. Comparison of the speed of
decline is interesting. For 0 < d < 1, the AR coefficients (asymptotically) decline
faster than the MA coefficients, whereas for d < 0, the opposite is the case. For
(d) (d)
given d, the relative rate of decline of Ar /Br is governed by r−2d , which can
be substantial for large r and even quite modest values of d. For example, for
d = 0.5 the relative decline is of the order of r−1 , whereas for d = −0.5 it is the
other way around. This observation explains the visual pattern in the figures
reported below in Section 3.1.5.
(d)
Notice that the MA coefficients Br for given d are the same as the AR coef-
(d) (d̄)
ficients Ar for −d and vice versa. To see this note that for d = d̄, Ar =
(d̄) (−d̄)
(−1)r d̄ Cr and Br = (−1)r −d̄ Cr whereas for d = −d̄, Ar = (−1)r −d̄ Cr and
(−d̄)
Br = (−1)r d̄ Cr .
The MA coefficients trace out the impulse response function for a unit shock
in εt . A point to note is that even though the sequence {yt } is nonstationary for
d ∈ [½, 1), the MA coefficients eventually decline. This property is often referred
to as mean reversion, but since the variance of yt is not finite for d ≥ 0.5, this term

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 81 — #6

82 Unit Root Tests in Time Series Volume 2

may be considered inappropriate. An alternative is to say that the process is not

(inﬁnitely) persistent or, bearing in mind the qualiﬁcation, just non-persistent.

3.1.3 Two deﬁnitions of an I(d) process, d fractional

In this section we consider how to deﬁne an I(d) process such as:

(1 − L)d (yt − y0 ) = ut (3.22)

where ut is a weakly dependent stationary process and y0 is a starting value.

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
There are two ways of approaching the deﬁnition of a fractional I(d) process,
referred to as partial summation and the direct approach (or deﬁnition), and
these are considered in the next two sections.

3.1.3.i Partial summation

One approach is to start with zt , a stationary I(d − 1) process, and then sum
zt in order to obtain an I(d) process, d ∈ [1/2, 3/2). (The unit root process can be
considered a special case of this procedure as it sums an I(0) process.) This can
be repeated in order to obtain higher-order processes.
Consider the following recursion typically associated with a unit root process:

yt = yt−1 + zt t≥1 (3.23)

where zt are termed increments. Then by back-substitution, and given the

starting value y0 , we obtain:
t
yt = y0 + zj t≥1 (3.24)
j=1

The difference from the standard unit root process is that the zt are generated
by an I(d − 1) process given by:

(1 − L)d−1 zt = ut
⇒

zt = (1 − L)1−d ut
∞ (d−1)
= Br ut−r for d ∈ [1/2, 3/2) (3.25)
r=0

Substituting for zt in (3.24) shows the dependence of yt on lagged ut , including

‘pre-sample’ values (t < 1):
t ∞ (d−1)
y t = y0 + Br uj−r t≥1 (3.26)
j=1 r=0

(d−1)
The increments zt are I(d − 1), with (d − 1) < ½, and the MA coefficients Br
are defined in (3.19). Also notice that if d ∈ (½, 1), then d − 1 ∈ (−½, 0), so that zt ,
on this definition, exhibits negative autocorrelation and is referred to as being
anti-persistent.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 82 — #7

Fractional Integration 83

Noting that (1 − L)−1 is the summation operator and zt = 0 for t < 1, then
(3.26) can be rewritten as follows:

yt = y0 + (1 − L)−1 (1 − L)1−d ut (3.27)

⇒

(1 − L)(yt − y0 ) = (1 − L)1−d ut multiply through by (1 − L)

⇒

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(1 − L)d (yt − y0 ) = ut multiply through by (1 − L)d−1 , t≥1 (3.28)

Thus, given the deﬁnition of zt as the moving average of a stationary input ut

with moving average coefﬁcients given by the binomial expansion of (1 − L)d−1 ,
(d − 1) < ½, then the process yt ∼ I(d), d ∈ [1/2, 3/2), follows by summation.
Next, consider xt ∼ I(d) with d ∈ [3/2, 5/2), then such a process is deﬁned by the
double summation of the primary input zt ∼ I(d − 2), where (d − 2) < ½, that is:
t
xt = x0 + yj t≥1
j=1
t
i
= x0 + zj (3.29)
i=1 j=1

where:

(1 − L)d−2 zt = ut (d − 2) < ½ (3.30)

∞ (d−2)
zt = Br ut−r
r=0

By applying the summation operator twice,

(1 − L)d (xt − x0 ) = ut d ∈ [3/2, 5/2), t≥1 (3.31)

Note that the partial summation approach nests the unit root framework that is
associated with integer values of d. For purposes of tying in this approach with
the corresponding fractional Brownian motion (fBM) this will be referred to as
a type I fractional process.

3.1.3.ii Direct deﬁnition

An alternative deﬁnition of an FI(d) process is based directly on the binomial
expansion without the intermediate step of partial summation. Recall that the
operator (1 − L)d has the AR and MA forms given by:
∞ (d)
∞ (d)
(1 − L)d ≡ Ar and (1 − L)−d ≡ Br
r=0 r=0

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 83 — #8

84 Unit Root Tests in Time Series Volume 2

Of particular interest for the direct deﬁnition of an I(d) process is the MA form
truncated to reﬂect the start of the process t = 1, that is:

yt − y0 = (1 − L)−d ut t≥1
t−1 (d)
= Br ut−r (3.32)
r=0

This is a well-deﬁned expression for all values of d, not just d ∈ (−½, ½).
The direct summation approach leads to what will be referred to as a type II

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
fractional process. Where it is necessary to distinguish between yt gener-
(I) (II)
ated according to the different deﬁnitions they will be denoted yt and yt ,
respectively.

3.1.4 The difference between type I and type II processes

Do the type I and type II definitions imply different processes? When d < ½,
(I) (II)
yt and yt differ by terms of order Op (t(2d−1) ), so that the difference van-
ishes asymptotically. However, this is not the case for d ≥ ½, with the different
processes, appropriately normalised, leading to different types of fractional
Brownian motion, known as type I and type II fBM respectively; see Marinucci
and Robinson (1999), Robinson (2005) and Shimotsu and Phillips (2006).
(I) (II)
It can be shown that the difference between yt and yt is essentially one
of how the presample observations are treated. Note from the type I definition
(d−1)
that zt = ∞ r=0 Br ut−r involves an infinite summation of present ut and ut−r ,
r > 0, whereas on the type II definition, the ut are assumed to be zero before
t = 1. Indeed, as Shimotsu and Phillips (2000, 2006) observe, for d ∈ [1/2, 3/2), yt
on the type I definition can be written as:
(I) (II)
yt = y t + ξ0 (d) (3.33)

where the term ξ0 (d) captures the presample values of ut ; this term is of the
(II)
same stochastic order as yt for d ∈ [1/2, 3/2), which means that it cannot be
ignored asymptotically, so that the initialisation, or how the pre-sample values
are treated, does matter.

3.1.5 Type I and type II fBM

As indicated, type I and type II deﬁnitions of an FI(d) process are associated
with different types of fBM, which are deﬁned as follows when considered as
real continuous processes:
r 0
1 1
Y(I) = (r − s)d dW(s) + [(r − s)d − (−s)d ]dW(s) (3.34)
(d + 1) 0 (d + 1) −∞
r
1
Y(II) = (r − s)d dW(s) (3.35)
(d + 1) 0

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 84 — #9

Fractional Integration 85

where r, s ∈ , | d | < ½ and W(s) is regular Brownian motion (BM) with variance
2 , which is a standard BM if σ2 = 1, in which case it is usually referred to as B(s).
σlr lr
The unit root case is worth recalling brieﬂy. Consider the simple random walk

yt = tj=1 zj , t = 1, . . . , T, which is an example of a partial sum process based on
the ‘increments’ zt ∼ I(0), which may therefore exhibit weak dependence and,
for simplicity, the initial value y0 has been set to zero. This can be written
equivalently by introducing the indexing parameter r:
[rT]

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
yT (r) = zt (3.36)
t=1

The notation [rT] indicates the integer part of rT; thus, rT is exactly an integer
for r = j/T, j = 1, . . . , T and, for these values, gives the random walk over the
integers. However, yT (r) can be considered as continuous function of r, albeit it
will be a step function, with the ‘steps’ becoming increasingly smaller as T → ∞,
so that, in the limit, yT (r) is a continuous function of r. To consider this limit,
√
yT (r) is ﬁrst normalised by σlr,z T, where σlr,z2 < ∞ is the ‘long-run’ variance of

zt , so that:

yT (r)
YT (r) ≡ √ (3.37)
σlr,z T

2 ). There then follows

(See UR, Vol. 1, chapters 1 and 6 for the deﬁnition of σlr,z
a form of the functional central limit theorem (FCLT) in which YT (r) weakly
converges to standard Brownian motion:

YT (r) ⇒D B(r) (3.38)

where B(r) is standard BM.

However, when zt ∼ I(d), d < | 1/2 | , d = 0, and therefore yt ∼ I(d + 1), the result
in (3.38) no longer holds. The limiting result in this case depends on whether
yt is generated by a type I or type II process. In the case of a type I process (and
setting y0 = 0),

(I)
[rT]
yt (r) = zt where zt ∼ I(d), d < | 1/2 | , d = 0
t=1

(I)
and yt (r) is then normalised as follows:

(I)
yT (r)
Y(I) (r) = 1
(3.39)
σlr,z T /2+d

The FCLT leads to type I fBM, Y(I) , that is:

Y(I) (r) ⇒D Y(I) (3.40)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 85 — #10

86 Unit Root Tests in Time Series Volume 2

where Y(I) is deﬁned in (3.34). An analogous result holds for a type II process:
(II)
yT (r)
Y(II) (r) = 1
σlr,z T /2+d
⇒D Y(II) (3.41)

The impact of the different forms of fBM on some frequently occurring esti-
mators is explored by Robinson (2005). (For example, if the properties of an

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
estimator of, say, d are derived assuming type I fBM, what are its properties if
type II fBM is assumed?)

3.1.6 Deterministic components and starting value(s)

3.1.6.i Deterministic components
The DGP of (3.1) assumes that there are no deterministic components in the
mean. In practice this is unlikely. As in the case of standard DF tests, the
leading speciﬁcation in this context is to allow the fractionally integrated pro-
cess to apply to the deviations from a deterministic mean function, usually a
polynomial function of time, such that:

(1 − L)d [yt − µt ] = ut (3.42)

As in the standard unit root framework, the subsequent analysis is in terms of

observations that are interpreted as deviations from the mean (or detrended
observations). Deﬁning yt as the deviation from the trend function, then the
FI(d) process is written as:

(1 − L)d
yt = ut (3.43)

y t ≡ yt − µt

The cases likely to be encountered in practice are µt = 0, µt = y0 and µt = β0 +β1 t;

more generally one could allow for the trend to be α-order, that is µt = βtα , where
α is not necessarily an integer (this is a case considered by Phillips and Shimotsu,
2004). To simplify the notation, the expositional case will be written as in (3.1),
however, there are occasions in this and the next chapter where the presence
of a deterministic trend will be taken into account explicitly and in these cases
(3.42) will be adopted. Also since most of the development does not make use of
the more general property of ut rather than εt , then the latter will be the default
choice.

3.1.6.ii Starting value(s)

There are two cases to be considered depending on how ‘prehistorical’ obser-
vations are dealt with, and they correspond to the different types of fBM. The

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 86 — #11

Fractional Integration 87

leading case is where εt = 0 for t ≤ 0, so that the process is assumed to start in

period t = 1. This is written as εt 1(t>0) , so that:

(1 − L)d
yt = εt 1(t>0) t = 1, . . . , T (3.44)

For obvious reasons, this is sometimes referred to as a truncated process.

Alternatively, the process can be viewed as starting in the inﬁnite past, that is:

(1 − L)d
yt = εt t = −∞, . . . , 0, 1, . . . , T (3.45)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
As noted these two representations give rise to type I and type II fractional
Brownian motion (fBM) where the latter is associated with the truncated process
and the former with the non-truncated process; see Marinucci and Robinson
(1999), and Davidson and Hashimzade (2009), and see Section 3.1.5.
Recall that to write t = 1 is a matter of notational convenience; in practice,
the sample starting date will be a particular calendar date, for example, 1964q1,
which raises the question, for actual economic series, of whether it is reasonable
to set all shocks before the particular start date at hand to zero. As Davidson
and Hashimzade (2009) note, whilst in time series modelling such a truncation
often does not matter, at least asymptotically, in this case it does. This issue is
of importance in, for example, obtaining the critical values for a particular test
statistic that is a functional of type I fBM, then it would be natural enough to
start the process at some speciﬁc date in a simulation setting, but type II fBM
may be a better approximation for ‘real world’ data. See Robinson (2005) for a
discussion of some of the issues.

3.2 The ARFIMA(p, d, q) model

The value of d determines the long-run characteristics of yt . Additional short-run

dynamics are obtained by generalising the simple fractional model to include AR
and MA components represented in the lag polynomials φ(L) and θ(L), respec-
tively. This results in the fractional ARMA(p, d, q) model, usually referred to as
an ARFIMA(p, d, q) model, given by:

φ(L)(1 − L)d yt = θ(L)εt (3.46)

For simplicity it is assumed that µt = 0. Alternatively, (3.46) can be written as:

(1 − L)yt = ut (3.47a)

(1 − L)d ut = w(L)εt (3.47b)

where d ≡ d − 1 and w(L) = φ(L)−1 θ(L). This allows the interpretation of yt as a

unit process subject to fractionally integrated and serially correlated noise; see
Section 3.1.1.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 87 — #12

88 Unit Root Tests in Time Series Volume 2

Provided that the roots of φ(L) are outside the unit circle (see UR, Vol. 1,
chapter 2), the ARFIMA(p, d, q) model can be inverted to obtain the moving
average representation given by:

yt = φ(L)−1 θ(L)(1 − L)−d εt (3.48)

= w(L)B(L)εt
= ω(L)εt

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(The polynomials in (3.46) are scalar so the rearrangement of their order in
(3.48) is valid.) The moving average polynomial is ω(L) = φ(L)−1 θ(L)(1 − L)−d ,
which is now the convolution of three polynomials.

3.2.1 Autocovariances and autocorrelations of the ARFIMA(0, d, 0)

process, d ≤ 0.5
3.2.1.i Autocovariances
The lag k autocovariances (for a zero mean process), E(yt yt−k ) ≡ γ(k), of the
fractionally integrated process are as follows:

(−1)k (−2d)!
γ(k) = σ2 (3.49a)
(k − d)!(−k − d)! ε
(−1)k (1 − 2d)
= σ2 (3.49b)
(k − d + 1)(1 − k − d) ε
The variance is obtained by setting k = 0 in (3.49a) or (3.49b):
(−2d)! 2
γ(0) = σ (3.50)
[(−d)!]2 ε
Note that the variance, γ(0), is not ﬁnite for d ≥ 0.5. Throughout it has been
assumed that µt = 0, otherwise yt should be replaced by yt − µt .

3.2.1.ii Autocorrelations
The lag k autocorrelations, ρ(k) ≡ γ(k)/γ(0), are given by:
(−d)!(k + d − 1)!
ρ(k) = k = 0, ±1, . . . (3.51a)
(d − 1)!(k − d)!
d(1 + d) . . . (k − 1 + d)
= (3.51b)
(1 − d)(2 − d) . . . (k − d)
(1 − d)(k + d)
= (3.51c)
(d)(k + 1 − d)
(k+d)
As k → +∞, then (k+1−d)
→ k (d)−(1−d) = k 2d−1 , therefore:

(1 − d) 2d−1
ρ(k) → k (3.52)
(d)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 88 — #13

Fractional Integration 89

As γ(k) and ρ(k) are related by a constant (the variance), they have the same
order, namely Op (k2d−1 ).
The ﬁrst-order autocorrelation coefﬁcient for d < 0.5 is obtained by setting
k = 1 in (3.51a), that is:
(−d)!(d!)
ρ(1) =
(d − 1)!(1 − d)!
= d/(1 − d) (3.53)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Note that ρ(1) → 1 as d → 0.5. Application of (3.53) for d > 0.5 results in ρ(k) > 1,
which is clearly invalid for an autocorrelation.
From (3.52), the behaviour of the sum of the autocorrelations will be deter-
mined by the term k 2d−1 . Clearly, such a sum will not be finite if d ≥ 0.5
because the exponent on k will be greater than 0, but is d < 0.5 a sufficient
condition? The answer is no. The condition for its convergence can be estab-
lished by the p-series convergence test (a special case of the integral test); write
k 2d−1 = 1/k−(2d−1) , then convergence requires that −(2d − 1) > 1, that is d < 0.
(The case d = 0 is covered trivially as all ρ(k) apart from ρ(0) are zero.) It is this
nonsummability aspect that is sometimes taken as the defining feature of ‘long
memory’.

3.2.1.iii Inverse autocorrelations

The inverse autocorrelations, ρ(k)inv , are given by:
(d)!(k − d − 1)!
ρ(k)inv =
(−d − 1)!(k + d)!
−d(1 − d) . . . (k − 1 − d)
=
(1 + d)(2 + d) . . . (k + d)
(d + 1)(k − d)
= (3.54)
(−d)(k + d + 1)
(k−d)
As k → ∞, (k+d+1)
→ k (−d)−(d+1) = k −2d−1 ; thus,

(d + 1) −2d−1
ρ(k)inv → |k| (3.55)
(−d)
Note that the inverse autocorrelations are obtained by replacing d by −d in the
standard autocorrelations (hence ρ(k|d)inv = ρ(k| − d)).

3.2.2 Graphical properties of some simple ARFIMA models

Figures 3.1a to 3.3b illustrate some of the properties of the fractional d process.
The autocorrelations and sum of the autocorrelations are shown in Figures 3.1a
and 3.1c for d = 0.3, 0.45, and in Figures 3.1b and 3.1d for d = −0.3, −0.45.
Evidently, it is positive values of d that exhibit a slow decline in the autocor-
relation function, ρ(k), which are likely to characterise economic time series.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 89 — #14

90 Unit Root Tests in Time Series Volume 2

(a) ρ(k) for d > 0 (b) ρ(k) for d < 0

1 0

0.8 d = 0.45
–0.1 d = –0.3
0.6
–0.2
0.4
d = 0.45 d = –0.45
0.2 –0.3

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0
lag k 0 50 100 150 lag k 0 5 10 15 20

(c) Σρ(k) for d > 0 (d) Σρ(k) for d < 0

100 –0.1

80 d = 0.45 –0.2
60
–0.3 d = –0.3
40
d = 0.3 d = –0.45
–0.4
20

lag k 0 0 –0.5
50 100 150 lag k 0 50 100 150

Figure 3.1 Autocorrelations for FI(d) processes

(a) MA coefficients for d > 0 (b) MA coefficients for d < 0

0.5 0

0.4 –0.1

0.3 –0.2 d = –0.3

0.2 d = 0.45 –0.3
d = 0.3
0.1 –0.4 d = –0.45
0 –0.5
lag k 0 50 100 150 lag k 0 5 10 15 20

(c) Sum MA coefficients d > 0 (d) Sum MA coefficients d < 0

10 –0.2

8 d = 0.45
d = –0.45
6
d = 0.3 –0.6
4 d = –0.3
–0.8
2

0 –1
lag k 0 50 100 150 lag k 0 50 100 150

Figure 3.2 MA coefﬁcients for FI(d) processes

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 90 — #15

Fractional Integration 91

(a) AR coefficients for d > 0 (b) AR coefficients for d < 0

0 0.5

–0.1 0.4

–0.2 0.3 d = –0.45

d = 0.3
–0.3 0.2 d = –0.3

–0.4 d = 0.45 0.1

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
–0.5 0
lag k 0 5 10 15 20 lag k 0 50 100 150

(c) Sum AR coefficients d > 0 (d) Sum AR coefficients d < 0

–0.2 10

8 d = –0.45
–0.4
d = 0.45 d = –0.3
6
–0.6
d = 0.3
4
–0.8
2

–1 0
lag k 0 50 100 150 lag k 0 50 100 150

Figure 3.3 AR coefﬁcients for FI(d) processes

The slow decline and non-summability is more evident as d moves toward the
nonstationary boundary of 0.5. The autocorrelations for negative d are slight,
decline quickly and are summable.
The corresponding MA and AR coefficients and their respective sums are
shown in Figures 3.2a–3.2d, and Figures 3.3a–3.3d, respectively. These are plots
(d) (d)
of the Br coefficients from (3.15) and the Ar coefficients from (3.14), respec-
tively. Again it is evident from the positive and declining MA coefficients in
Figure 3.2a that positive values of d are more likely to characterize economic
times series than negative values of d, which, as Figure 3.2b shows, generate
negative MA coefficients. Note also that the AR coefficients, and their respec-
tive sums, in Figures 3.3a to 3.3d are simply the negative of the corresponding
MA coefficients in Figures 3.2a to 3.2d.
To give a feel of what time series generated by fractional d look like, simulated
time series plots are shown in Figures 3.4a to 3.4d. In these illustrations d = −0.3
(to illustrate ‘anti-persistence’), then d = 0.3 and d = 0.45 each illustrates some
persistence, and d = 0.9 illustrates the nonstationary case. In addition, to mimic
economic time series somewhat further, some serial correlation is introduced
into the generating process; in particular, the model is now ARFIMA(1, d = 0.9,
0) with φ1 = 0.3, 0.5, 0.7 and 0.9 in Figures 3.5a to 3.5d respectively. As the degree
of serial correlation increase, the series becomes ‘smoother’ and characterises
well the typical features of economic time series.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 91 — #16

92 Unit Root Tests in Time Series Volume 2

(a) d = –0.3 (b) d = 0.3

4 4

2 2

0 0

–2 –2

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
–4 –4
0 100 200 300 400 500 0 100 200 300 400 500

(c) d = 0.45 (d) d = 0.9

4 15

2
10
0
5
–2
0
–4

–6 –5
0 100 200 300 400 500 0 100 200 300 400 500

Figure 3.4 Simulated data for fractional d

(a) d = 0.9, φ1 = 0.3 (b) d = 0.9, φ1 = 0.5

30 30

20
20
10
10
0
0
–10

–10 –20
0 100 200 300 400 500 0 100 200 300 400 500

(c) d = 0.9, φ1 = 0.7 (d) d = 0.9, φ1 = 0.9

80 250

200
60
150
40
100
20 as φ1 increases
50 the series
becomes 'smoother'
0 0
0 100 200 300 400 500 0 100 200 300 400 500

Figure 3.5 Simulated data for fractional d and serial correlation

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 92 — #17

Fractional Integration 93

3.3 What kind of models generate fractional d?

We outline two justiﬁcations for allowing d to be fractional; the ﬁrst is due to

Parke (1999) and relates to the error duration model and the second to Granger
(1980), which relates to the aggregation of micro processes that are individually
ARMA.

3.3.1 The error duration model (Parke, 1999)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
The essence of the error duration model, EDM, is that there is a process gen-
erating stochastic shocks in a particular economic activity that have a lifetime,
or duration, which is itself stochastic, and that the generic time series variable
yt is the sum of those shocks surviving to period t. The probability of a shock
originating in period s surviving to period t, referred to as the survival prob-
ability, is critical in determining whether the process is one with inherently
long memory (defined as having nonsummable autocovariances). Parke (op.
cit.) gives two examples of the EDM. In the first, shocks to aggregate employ-
ment originate from the birth and death of firms; if a small proportion of firms
have a very long life, then long memory is generated in aggregate employment.
The second example concerns financial asset positions: once an asset position is
taken, how long will it last? This is another example of a survival probability; if
some positions, once taken, are held on a long term basis, there is the potential
for a process, for example that of generating volatility of returns, to gener-
ate long memory. This line of thought generates a conceptualisation in which
events generate shocks, for example productivity shocks, oil price shocks, stock
market crashes, which in turn affect and potentially impart long memory into
economic activity variables, such as output and employment. Equally, one can
regard innovations as ‘shocks’ in this sense; for example, mobile phone technol-
ogy, plasma television and computer screens, and then consider whether such
shocks have a lasting effect on consumer expenditure patterns.

3.3.1.i The model

Let {ξt }∞
1 be a series of iid shocks with mean zero and ﬁnite variance σ . Each
2

shock has a survival period; it originates in period s and survives until period
s + ns . The duration or survival period of the shock ns − (s − 1) is a random
variable. Deﬁne a function gs,t that indicates whether a shock originating in
period s survives to period t; thus, gs,t = 0 for t > s + ns , and gs,t = 1 for s ≤ t
< s + ns ; the function is, therefore, a form of indicator function. The shock
and its probability of survival are assumed to be independent. The probability
that the shock survives until period s +k is pk = p(gs,s+k = 1), which deﬁnes
a sequence of probabilities {pk } for k ≥ 0. For k = 0, corresponding to gs,s = 1,
assume p0 = 1, so that a shock lasts its period of inception; also, assume that the
sequence of probabilities {pk } is monotone non-increasing. The {pk } are usually

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 93 — #18

94 Unit Root Tests in Time Series Volume 2

referred to as the survival probabilities. A limiting case is when the process has
no memory, so that pk = p(gs,s+k = 1) = 0 for k ≥ 1. Otherwise, the process has
a memory characterised by the sequence {pk }, and the interesting question is
the persistence of this memory, and does it qualify, in the sense of the McLeod-
Hippel (1978) deﬁnition, to be described as long memory.
Finally, the realisation of the process at time is yt , which is the sum of all the
errors that survive until period t; that is:

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19

t
yt = gs,t ξs (3.56)
s=−∞

3.3.1.ii Motivation: the survival rate of ﬁrms

A brief motivation, to be expanded later, may help this framework. The stock
of firms in existence at any one time comprises firms of very different ages;
some are relatively recent (new firms are ‘born’), others range from the mod-
erately lived through to those that have been in business a long time. Let
each of these firms have an impact ξs (for simplicity assumed constant here)
on total employment, yt . At time t, total employment is the sum of present
and past employment ‘shocks’ that have survived to time t. For example,
g10,t ξ10 comprises the employment impact or shock that occurred at time s
= 10, ξ10 , multiplied by the indicator function, g10,t , which indicates whether
that impact survived until period t. Now, whether total employment is best
modelled as a short-or long-memory process depends upon the survival proba-
bilities. If the survival probabilities decline slowly, to be defined precisely below,
total employment, yt , will have long memory in the sense used earlier in this
chapter.

3.3.1.iii Autocovariances and survival probabilities

The k-th autocovariance of yt is γ(k) = σ2 ∞ j=k pj . Recall that a process is said
to have long memory if the sequence of its autocovariances is not summable;

that is, lim k=n
k=−n γ(k) → ∞ as n → ∞. Normalising σ to 1 without loss, the
2

autocovariances, γ(k), k = 1, . . . , n, are given by a sequence of partial sums:

γ(1) = p1 + p2 + p3 + · · ·
γ(2) = p2 + p3 + p4 + · · ·
..
.
γ(k) = pk + pk+1 + pk+1 + · · · (3.57)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 94 — #19

Fractional Integration 95

The pattern is easy to see from these expressions, and the sum of the autoco-
variances, for k = 1, . . . , n, is:
n
γ(k) = p1 + 2p2 + 3p3 · · ·
k=1
n
= kpk (3.58)
k=1

Hence, the question of long memory concerns whether n k=1 kpk is summable.
(Omitting γ(0) from the sum does not affect the condition for convergence.) The

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
survival probabilities are assumed not to increase; let them be characterised as:

pk = ck α (3.59)

where c is positive constant and α < 0, such that 0 ≤ ck α ≤ 1; α > 0 is ruled out
as otherwise pk > 1 is possible for k large enough. (All that is actually required is
that the expression for pk (k) in (3.59) holds in the limit for n → ∞.)
α (1+α) . The
The individual terms in n j=1 kpk are given by kpk = k(ck ) = ck
series to be checked for convergence is, therefore:
n
γ(k) = c + 2c(2α ) + 3c(3α ) + kck α + · · ·
k=1

= c[1 + 2α+1 + 3α+1 + k α+1 + · · · ]

= c[1 + 1/2−(1+α) + 1/3−(1+α) + 1/k −(1+α) + · · · ] (3.60)

The series in [.] parentheses is well known (it is sometimes referred to as a

p-series) and the condition for its convergence can again be established by the
p-series convergence test, which requires that the exponent on k (as written in
the last line of (3.60)) is greater than 1, that is −(1 + α) > 1, which implies that
α < − 2. The series is divergent, that is ‘nonsummable’, if α ≥ −2. We can write
α = −2(1 − d), in which case the condition for convergence in terms of d is d < 0,
otherwise the series is divergent; in particular, if 0 < d ≤ 1, then the series is not
convergent; the right-hand side of this limit can also be extended, since the
series continues to be nonsummable as d becomes increasingly positive. This
implies that if the survival probabilities approach zero at or more slowly than
k −2 , then long memory is generated. For example, if d = 0.25, then α = −1.5,
and the survival probabilities are pk (k) = c/k1.5 , which generates long memory.
The first order autocorrelation of the error duration process is ρ(1) = λ/

(1 + λ) where λ = ∞ k=1 pk . Higher-order autocorrelations are also a function of
λ, which, therefore, determine whether the autocorrelations are well defined.
For pk = ck α , the integral test can again be used to assess convergence; this time
the exponent is −α and the condition for convergence is α < − 1, which implies
d < 0.5 (from α = −2(1 − d)). That is, the survival probabilities must approach
zero at least as fast as k−1 for the autocorrelations to be defined. If d ∈ [0.5, ∞),
then λ is not finite and, therefore, yt has long memory and is nonstationary.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 95 — #20

96 Unit Root Tests in Time Series Volume 2

This outline shows that the error duration model can generate a series yt that is
integrated of order d, that is I(d). Hence, applying the d-th difference operator
(1 − L)d to yt reduces the series to stationarity; that is (1 − L)d yt = εt , where
εt is iid.

3.3.1.iv The survival probabilities in a long-memory process

The survival probabilities for the I(d) error duration model with 0 < d ≤ 1 are:
(k + d)(2 − d)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
pk = (3.61)
(k + 2 − d)(d)
with the following recursion starting from p0 = 1:
k+d
pk+1 = pk (3.62)
k+2−d
Note that the pk in (3.61) satisfy (3.59) as a limiting property. The recursion
shows that the ratio of successive probabilities pk+1 /pk , referred to as the condi-
tional survival probabilities, tends to one as k increases; thus, there is a tendency
for shocks that survive a long time to continue to survive. Figure 3.6 plots the
survival probabilities and the conditional survival probabilities for d = 0.15 and
d = 0.45. The relatively slow decline in the survival probabilities is more evident
in the latter case; in both cases pk+1 /pk → 1 as k → ∞.

0.35 1

Survival probabilities
0.3
0.9
d = 0.45
0.25
0.8
d = 0.15
0.2
0.7
0.15 d = 0.45

0.6
0.1
d = 0.15 Conditional survival
0.5 probabilities → 1 as k → ∞
0.05

0 0.4
5 10 15 20 25 30 10 20 30 40 50
k k

Figure 3.6 Survival and conditional survival probabilities

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 96 — #21

Fractional Integration 97

Table 3.1 Empirical survival rates Sk and conditional survival rates Sk /Sk−1 for US ﬁrms

Year k ⇒ 1 2 3 4 5 6 7 8 9 10

Sk 0.812 0.652 0.538 0.461 0.401 0.357 0.322 0.292 0.266 0.246
Sk /Sk−1 0.812 0.803 0.826 0.857 0.868 0.891 0.902 0.908 0.911 0.923

Source: Parke (1999); based on Nucci’s (1999) data, which is from 5,727,985 active businesses in the
1987 US Bureau of the Census Statistical Establishment List.

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.3.1.v The survival probabilities in a short-memory, AR(1), process
It is useful to highlight the difference between the survival probabilities for a
long-memory process and those for a short-memory process, taking the AR(1)
model as an example of the latter. Consider the AR(1) model (1 − φ1 L)yt = ηt
where ηt is iid and 0 ≤ φ1 < 1. The autocorrelation function is ρ(k) = φk1 , with
recursion ρ(k +1) = φ1 ρ(k). This matches the case where the conditional survival
probabilities are constant, that is pk+1 /pk = φ1 starting with p0 = 1; thus, p2 =
φ1 p1 , p3 = φ1 p2 and so on. Hence, the survival probability pk is just the k-th
autocorrelation, ρ(k). The AR(1) model with φ1 = 0.5 and the I(d) model with
d = 1/3 have the same ﬁrst order autocorrelation of 0.5, but the higher-order
autocorrelations decline much more slowly for the I(d) model.

3.3.2 An example: the survival rate for US ﬁrms

Parke (1999) gives the following example of a long memory model. The empir-
ical survival rate Sk for US business establishments is the fraction of start-ups in
year 0 that survive to year k; Sk is the empirical counterpart of pk . Table 3.1 gives
these survival rates for nearly 6 million active businesses. For example, 81.2% of
businesses last one year after starting, 65.2% last two years through to 24.6% of
businesses lasting ten years after starting. Note from the row in the table headed
Sk /Sk−1 that the conditional survival rates increase, making the long-memory
model a candidate for the survival process. The AR(1) model is not a candidate.
The latter implies p2k = (pk )2 but, taking Sk as an estimate of pk , this is clearly
not the case, especially as k increases; for example, S5 = 0.401 would predict
S10 = 0.4012 = 0.161 compared with the actual S10 = 0.246.
To estimate d requires a speciﬁcation of how the survival probabilities are
generated. Consider the model pk = ck α as in (3.59), with α = −2(1 − d), either
as an exact representation or as an approximation for a long-memory process;
then the conditional survival probabilities j periods apart are:

pk /pk+j = ck α /c(k + j)α (3.63)

= k α /(k + j)α

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 97 — #22

98 Unit Root Tests in Time Series Volume 2

Hence, solving for α and then d by, taking logs and rearranging, we obtain:
α
k
ln(pk /pk+j ) = ln (3.64)
(k + j)
ln(pk /pk+j )
⇒ α= (3.65)
ln[k/(k + j)]
ln(pk /pk+j )
⇒ d = 1 + 0.5 (3.66)
ln[k/(k + j)]

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Given data, as in Table 3.1, an estimate, d̂, can be obtained by choosing k and j;
for example, k = 5 and j = 5 gives d̂ = 0.65, indicating long memory and non-
stationarity. With an estimate of d, an estimate of the scaling factor c can be
obtained from c = k α /pk for a choice of k. For d̂ = 0.65 and k = 10, Parke obtains

c = 1.25 by ensuring that p10 = S10 , so the estimated model is:

pk = 1.25k −2+2(0.65)
(3.67)

This estimated function is graphed in Figure 3.7, and ﬁts the actual survival rates
particularly well from S5 onward, but not quite as well earlier in the sequence;
and also note that p1 = 1.25(1)−2+2(0.65) = 1.25 > 0.
An alternative to using just one choice of k and j, which might reduce the
problem of the estimated probabilities above one, is to average over all possible

1.6

1.4

1.2
Estimated probability

1
estimated using data for S5 and S10
0.8
estimated using all data
0.6
actual
0.4

0.2
1 2 3 4 5 6 7 8 9 10
Survival duration (years)

Figure 3.7 Survival rates of US ﬁrms: actual and ﬁtted

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 98 — #23

Fractional Integration 99

choices. There are 9 combinations for j = 1, that is the set comprising the pairs
(1, 2), (2, 3) . . . (9, 10); 8 combinations for j = 2 through to 1 combination for

j = 9, that is the pair (1, 10). In all there are 9j=1 j = 45 combinations; evaluating
these and averaging, we obtain d̂ = 0.69 and
c = 1.08, so the ﬁtted probability
function is:

pk = 1.08k −2+2(0.69)
(3.68)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
This function ﬁts the actual survival rates particularly well for S3 onward; the
problem with the starting probabilities has been reduced but not completely
removed. The survival rates and ﬁtted values from the revised estimate are also
shown in Figure 3.7.

3.3.3 Error duration and micropulses

The generating method for a fractionally integrated process described by Parke
is related to the generation of stochastic processes that are obtained as sums of
micropulses (ε), a literature due to the ideas of Mandelbrot (for example Cioczek-
Georges and Mandelbrot 1995, 1996). The micropulses, which we can interpret
from an economic perspective as infinitesimal shocks, are generated by one of
several stochastic processes; then, once generated, they have a random lifetime.
The use of the term micropulses reflects the eventual asymptotic nature of the
argument, with ε → 0 asymptotically, whereas the number of pulses → ∞. In
the simplest version the pulses are just ‘up-and-down’ or ‘down-and-up’ shocks,
that is a shock of size ε(−ε) at time t has a ‘cancelling echo’ of −ε(ε) at t + s,
where s is a random variable. The shocks appear at uniformly distributed points
in time. The sum of these micropulses generates fractional Brownian motion.
The generating process for the micropulses can be generalised to be more flexible
than this starting model, for example, allowing the micropulses to have varying
amplitude over their lifetime. These models seem quite promising for the prices
of actively traded stocks, where the micropulses can be interpreted as increments
of news that potentially have an impact on the stock price.

3.3.4 Aggregation
Granger (1980) showed that an I(d) process, for d fractional, could result from
aggregating micro relationships that were not themselves I(d). Chambers (1998)
has extended and qualiﬁed these results (and see also Granger, 1990). We deal
with the simplest case here to provide some general motivation for the genera-
tion of I(d) processes; the reader is referred to Granger (1980, 1990) for detailed
derivations and other cases involving dependent micro processes.
There are many practical cases where the time series considered in economics
are aggregates of component series; for example, total unemployment aggre-
gates both male unemployment and female unemployment, and each of these,
in turn aggregates different age groups of the unemployed or unemployment

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 99 — #24

100 Unit Root Tests in Time Series Volume 2

by duration of the unemployment spell. Aggregate consumption expenditure is

the result of summing expenditures by millions of households. Granger (1980)
provided a model in which the AR(1) coefﬁcients of the (stationary) micro rela-
tionships were drawn from a beta distribution and the resulting aggregate had
the property of long memory even though none of the micro relationships had
this property.
Note that this section involves some of the explanatory material on frequency
domain analysis, especially spectral density functions, included in the next

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
chapter; and the reader unfamiliar with frequency domain concepts will need
to ﬁrst review Sections 4.1.1 to 4.1.4.

3.3.4.i Aggregation of ‘micro’ relationships

First, consider two ‘micro’ time series comprising the sequences {ykt }Tt=1 , k = 1, 2,
each generated by a separate, stationary AR(1) process:

ykt = φk1 ykt−1 + εkt (3.69)

where the εkt are zero-mean, independent white noise ‘shocks’ (across the com-
ponents and time). The aggregate variable, which in this case is just the sum of
these two series, is denoted yt , so that:

yt = y1t + y2t (3.70)

This model follows an AR(2, 1) process, with AR lag polynomial given by (1 −

φ11 L)(1 − φ21 L); see, for example, Granger and Newbold (1977). In general, the
aggregation of N independent series of the kind (3.69) leads to an AR(N, N − 1)
model; cancellation of the roots of the AR and MA polynomials can occur, which
then reduces the order of the ARMA model.
In the case that the components yit of the aggregate yt are independent, the
spectral density function (referred to by the shorthand sdf) of yt is just the sum
of the sdfs for each component. For the individual component, the sdf for an
AR(1) process in the time domain is:
1
fyk (λj ) = fεk (λj ) (3.71a)
|(1 − e−iλj )|2
1 σε2k
= (3.71b)
|(1 − e−iλj )|2 2π
where fεk (λj ) is the sdf of the white noise input, εkt . Thus, the sdf for the
aggregate variable is:
N
fy (λj ) = fyk (λj ) (3.72)
k=1

In Granger’s model, the AR(1) coefﬁcients φk1 are assumed to be the outcomes
of a draw from a population with distribution function F(φ1 ). To illustrate the

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 100 — #25

Fractional Integration 101

argument, Granger (1980) uses the beta distribution, and whilst other distribu-
tions would also be candidates, the link between the beta distribution and the
gamma distribution is important in this context.

3.3.4.ii The AR(1) coefﬁcients are ‘draws’ from the beta distribution
The beta distribution used in Granger (1980) is:
2
dF(φ1 ) = φ2u−1 (1 − φ21 )v−1 dφ1 (3.73)
B(u, v) 1

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
where 0 ≤ φ1 ≤ 1, u, v > 0, and B(u, v) is the beta function with param-
eters u and v. The coefﬁcients φk1 , k = 1, 2, are assumed to be drawn
1
from this distribution. The beta function is B(u, v) = 0 φu−1 1 (1 − φ1 )
v−1 dφ =
1
1 2u−1
2 0 φ1 (1 − φ21 )v−1 dφ1 , which operates here to scale the distribution func-
tion so that its integral is unity over the range of φ1 . In this case, the k-th
autocovariance, γ(k), with v > 1, is given by:
B(u + k/2, v − 1)
γ(k) = (3.74a)
B(u, v)
(v − 1) (k/2 + u)
= (3.74b)
B(u, v) (k/2 + u + v − 1)

Applying Sterling’s theorem, the ratio (k/2+u)/ (k/2+(u+v −1)) is Op (k1−v );

note that the ﬁrst term in each of these (.) functions is (1/2)k, so (1/2) can be
regarded as a constant in determining Op (.). Then:

γ(k) → Ck 1−v (3.75)

where C is a constant.
On comparison with (3.52), 1 − v = 2d − 1, therefore d = 1 − (1/2)v; thus,
the aggregate yt is integrated of order 1 − (1/2)v; for example, if v = 1.5, then
d = 0.25. Typically, the range of interest for d in the case of aggregate time series
is the long-memory region, d ∈ (0, 1], which corresponds to v ∈ (2, 0], with
perhaps particular attention in the range v ∈ [1, 0], corresponding to d ∈ [0.5, 1].
Note that the parameter u does not appear in the order of integration, and it
is v that is the critical parameter, especially for φ1 close to unity, (see Granger
1980).

3.3.4.iii Qualiﬁcations
Granger (1990) considers a number of variations on the basic model including
changing the range of φ1 ; introducing correlation in the εit sequences, and
allowing the component series to be dependent; generally these do not change
the essential results. However, Granger (1990) notes that if there is an upper
↔ ↔
bound to φ1 , say φ 1 < 1, so that p(φ1 ≥ φ 1 ) = 0, then the fractional d result
no longer holds. Chambers (1998) shows that linear aggregation, for example

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 101 — #26

102 Unit Root Tests in Time Series Volume 2

in the form of simple sums or weighted sums, does not by itself lead to an
aggregate series with long memory. As Chambers notes (op.cit.), long memory
in the aggregate requires the existence of long memory in at least one of the
components, and the value of d for the aggregate will be the maximum value of
d for the components. In this set-up, whilst the aggregation is linear, the values
of φk1 are not draws from a beta distribution.

3.4 Dickey-Fuller tests when the process is

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
fractionally integrated

Given the widespread use of Dickey-Fuller tests, a natural question to ask is: what
are their characteristics when the time series is generated by a fractional inte-
grated process? It is appropriate to start with the LS estimator ρ̂ in the following
simple model:

yt = ρyt−1 + εt (3.76)

However, in contrast, the generating process is a simple random walk with

fractionally integrated noise, that is:

(1 − L)(yt − y0 ) = ut (3.77a)

d
(1 − L) ut = εt (3.77b)

with y0 = 0, −0.5 < d < 0.5 ⇒

d = d − 1. The sequence {εt } is iid with a zero mean
and E|εt | , for k ≥ [4, −8d/(1 + 2d)]; see Sowell (1990, theorem 2), who shows
k

that appropriately normalised, the LS estimator ρ̂ converges to 1. However, the

normalisation to ensure a nondegenerate limiting distribution depends on the
value of d.
Sowell (1990, theorem 3) shows that:
1 (1) 2
2 [B ]
T(ρ̂ − 1) ⇒D 1 (1)
d
for
d ∈ [0, 0.5) (3.78)
0 B (r)2 dr
d

(
d + 21 ) (1 +
d)
T(1+2d) (ρ̂ − 1) ⇒D − 1 (1)
for
d ∈ (−0.5, 0) (3.79)
(r)2 dr
(1 − d)
0 B
d

(1)
where B (r) is a standard (type I) fBM. (This statement of Sowell’s results embod-
d
ies the amendment suggested by Marinucci and Robinson (1999), which replaces
type II fBM with type I fBM.)
Note that the normalisation to ensure the convergence of ρ̂ depends on d,

and in general one can refer to the limiting distribution of Tmin(1,1+2d) (ρ̂ − 1).
In the case that d = 0, one recovers the standard unit root case in the ﬁrst of
these expressions (3.78).

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 102 — #27

Fractional Integration 103

0.9

0.8

0.7
τ
0.6 τβ

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Power

0.5 τµ

0.4

0.3

0.2

0.1

0
0.4 0.5 0.6 0.7 0.8 0.9 1
d

Figure 3.8 Power of DF tests against an FI(d) process

For the case

d ∈ (−0.5, 0.5), Sowell (op. cit., theorem 4) shows that the t statis-
tic associated with the LS estimator ρ̂, which in DF notation is referred to as τ̂,
diverges in probability to ∞ if d > 0 and to −∞ if d < 0. This result shows the
consistency of the (simple) DF tests against fractionally integrated alternatives.
(Dolado, Gonzalo and Mayoral, 2002 generalise this result to cover the asymp-
totically stationary and nonstationary cases in one theorem.) However, the rate

of convergence of ρ̂ is T for d ∈ [0, 0.5), but T(1+2d) for d ∈ (−0.5, 0], and note
that as d → −0.5 from above, convergence is very slow. Hence, there are parts
of the parameter space for which the power of the (simple) DF test τ̂ is likely to
be small, a conjecture that is supported by the Monte Carlo results in Diebold
and Rudebusch (1991).
The small sample power of the DF tests τ̂, τ̂µ and τ̂β against fractional alter-
natives is illustrated in Figure 3.8 for T = 200, with 5,000 replications and a
nominal size of 5% for each of the tests (using the DF critical values for T = 200).
As anticipated, power increases as d declines from 1 toward zero, being 100% at
about d = 0.6. Interestingly, and in contrast to the standard situation, including
superﬂuous deterministic components does not uniformly reduce the power of
the tests. (Power is not size-adjusted and τ̂ is slightly oversized, whereas τ̂β is
slightly undersized.)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 103 — #28

104 Unit Root Tests in Time Series Volume 2

In practice, it is augmented versions of the DF tests that are likely to be used

and the results from the simple (unaugmented) test do not carry over without
qualification. The ADF τ̂-type test is not consistent against fractionally inte-
grated alternatives if the lag length increases ‘too fast’ with T. This is a result
due to Hassler and Wolters (1994), amended by Krämer (1998). The reason for
likely difficulties with the ADF form of τ̂ can be seen from the AR representation
of an FI(d) process. From (3.8c) with ut = εt , the simple fractional model can be
represented as the following infinite autoregression:

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
∞
(d) (d) (d)
yt = r yt−r + εt where r ≡ −Ar (3.80)
r=1
(d)
= r (L)yt + εt

(d)
Applying the DF decomposition of the lag polynomial to r (L) (see UR, Vol. 1,
chapter 3) we obtain:
∞
yt = γ∞ yt−1 + cj yt−j + εt (3.81)
j=1

(d) (d) ∞ (d) ∞ (d)

where γ∞ = [r (1)−1] = ∞ r=0 r − 1 = − r=0 Ar − 1. Note that r=1 Ar =
−1 for d > 0, (Hassler and Wolters, 1994), hence γ∞ = 0 in the ADF(∞)
representation; also note that one can interpret d as d = 1 +
d.
In practice the ADF(∞) model is truncated to a ﬁnite number of lags, so that:
k−1
yt = γk yt−1 + cj yt−j + εt,k (3.82)
j=1
∞ ∞
(d)
εt,k = r yt−1 + cj yt−j + εt
r=k+1 j=k
∞ (d)
= r yt−r + εt (3.83)
r=k+1
k (d)
γk = r − 1
r=1
∞ (d)
= 1− r −1 (3.84)
r=k+1

(Note that (3.83) follows on comparison of (3.82) with (3.81).) Hence, γk will
(d)
not, generally, be zero but will tend to zero as k → ∞, since ∞ r=k+1 r → 0. This
means that despite the generating process not being the one assumed under the
null hypothesis, as k increases it would be increasingly difﬁcult to discover this
from the ADF regression. It was this result, supported by Monte Carlo simula-
tions, that led Hassler and Wolters (1994) to the conclusion that the probability
of rejection of the unit root null in favour of a fractional alternative decreases
for the τ̂ type ADF test as k increases. An extract from their simulation results is
shown in Table 3.2 below. For example, if d = 0.6 ( d = −0.4) then the probability
of rejection, whilst increasing with T, decreases with k from 86.6% for ADF(2) to

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 104 — #29

Fractional Integration 105

Table 3.2 Power of ADF τ̂µ test for fractionally integrated series

d = 1.0 d = 0.9 d = 0.6 d = 0.3

T = 100 T = 250 T = 100 T = 250 T = 100 T = 250 T = 100 T = 250

ADF(0) 5.1 4.6 13.8 19.9 92.7 99.9 100 100

ADF(2) 5.1 4.7 8.5 12.1 48.4 86.6 98.2 100
ADF(10) 4.6 4.5 4.8 6.2 9.4 28.8 22.9 85.2
ADF(12) 4.5 4.5 3.8 6.0 6.9 22.8 16.2 73.0

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Source: Extracted from Hassler and Wolters, 1994, table 1.
Notes: The generating process is (1 − L)d yt = εt ; the maintained regression is yt = µ∗ + γk yt−1 +
k−1
j=1 cj yt−j + εt,k , with 5,000 replications.

28.8% for ADF(10) when T = 250. Hassler and Wolters (1994) note that the PP
test, which is designed around the unaugmented DF test, does not suffer from
a loss of power as the truncation parameter in the semi-parametric estimator of
the variance increases.
Krämer (1998) points out that consistency can be retrieved if k, whilst increas-
ing with T, does not do so at a rate that is too fast. A sufficient condition, where

d ∈ (−0.5, 0.5), to ensure divergence of τ̂ is k = o(T1/2+d ), in which case τ̂ →p −∞
d < 0 and τ̂ →p ∞ for
for d > 0. For further details and development see Krämer
(1998). Of course, from a practical point of view, the value of d is not known but
it may be possible in some circumstances to narrow the range of likely values of

d in order to check that this condition is satisfied.
Given difficulties with the standard ADF test, if fractional integration is sus-
pected a more sensible route is to use a test specifically designed to test for this
feature. There are a number of such tests; for example one possibility is simply
to estimate d and test it against a specific alternative. This kind of procedure is
considered in the next chapter. In this chapter we first consider test procedures
based on the general DF approach, but applied to the fractional case.

3.5 A fractional Dickey-Fuller (FDF) test for unit roots

This section introduces an extension of the standard Dickey-Fuller (DF) test for
a unit root to the case of fractionally integrated series. This is a time series
approach, whereas frequency domain approaches are considered in the next
chapter. The developments outlined here are due to Dolado, Gonzalo and
Mayoral (2002), hereafter DGM as modiﬁed by Lobato and Velasco (2007), here-
after LV. The fractional DF tests are referred to as the FDF tests and the efﬁcient
FDF, EFDF, tests.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 105 — #30

106 Unit Root Tests in Time Series Volume 2

3.5.1 FDF test for fractional integration

This section outlines the original fractional DF test. As both the FDF test and
the EFDF test are based on the DF test, a brief reminder of the rationale of the
latter will be useful. In the simplest DF case the maintained regression is:

yt = γyt−1 + zt (3.85)

z t = εt

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
where εt ∼ iid(0, σε2 ) and γ = (ρ − 1). If yt is I(1), then the regression (3.85) is
unbalanced in the sense that the orders of integration of the regressand and
regressor are different, being I(0) and I(1) respectively. The LS (or indeed any
consistent) estimator of γ should be zero asymptotically to reﬂect this lack of
balance. The regression (3.85) is relevant to testing the null hypothesis of a unit
root against the alternative of stationarity, that is, in the context of the notation
of FI processes, H0 : d = 1 against HA : d = 0.
The essence of the FDF test is that the regression and testing principle can be
applied even if yt is fractionally integrated. Consider the maintained regression
for the case in which H0 : d = d0 against the simple (point) alternative HA : d
= d1 < d0 ; then, by analogy, the following regression model could be formulated:

d0 yt = γd1 yt−1 + zt (3.86)

The superscripts on refer to the value of d under the null and the (simple)
alternative hypothesis(es) or under specific non-unit root alternatives, respec-
tively, and the properties of zt have yet to be specified. In the standard DF case,
the null hypothesis is d0 = 1 and the (simple) alternative is dA = 0; thus, under
the null d0 yt = 1 yt = yt and d1 yt−1 = yt−1 , resulting in the DF regression
yt = γyt−1 + εt (so that in this special case zt = εt ). However, in the FDF case,
neither value of d is restricted to an integer.
DGM suggest using Equation (3.86), or generalisations of it, to test H0 : d = d0
against the simple (point) alternative HA : d = d1 < d0 or the general alternative
HA : d < d0 . The DF-type test statistics are constructed in the usual way as, for
example, with an appropriately normalised version of γ̂ = (ρ̂ − 1) or the t statistic
on γ̂, for the null that γ = 0, denoted t̂γ . In the case that a simple alternative is
specified, the value of d = d1 under the alternative is known and can be used as
in (3.86); however, the more general (and likely) case is HA : 0 ≤ d < d0 , or some
other composite hypothesis, and a (single) input value of d, say d1 for simplicity
of notation, is required to operationalise the test procedure. In either case LS
estimation of (3.86) gives the following:
T
t=2 yt yt−1
d0 d1
γ̂(d1 ) = T (3.87)
t=2 ( yt−1 )
d 1 2

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 106 — #31

Fractional Integration 107

γ̂
tγ (d1 ) = (3.88)

σd1 (γ̂)
T 1/2

σd1 (γ̂) = σd2 / (d1 yt−1 )2
1 t=2
T
σd2 =
ε2 /T

1 t=2 t

εt = d0 yt − γ̂d1 yt−1

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
In the special case where d0 = 1, then γ̂(d1 ) is given by:
T
yt d1 yt−1
γ̂(d1 ) = t=2
T
(3.89)
t=2 ( yt−1 )
d1 2

d1 is usually an estimator that is consistent under HA and is considered in greater

detail below.

3.5.2 A feasible FDF test

To make the FDF test feasible, the series under the alternative defined by d1 yt
has to be generated. The variable d1 yt is computed using the coefficients in
the expansion of (1 − L)d1 , as in Equation (3.3). (For the Present the binomial
expansion is assumed to be infinite; finite sample considerations are taken up
below.) That is:
∞ (d )
d1 yt = Ar 1 yt−r (3.90)
r=0
(d ) (d )
where A0 1 = 1, Ar 1 ≡ (−1)r d1 Cr . (The case of interest is assumed to relate to
H0 : d = d0 = 1, otherwise if d0 = 1, then d0 yt is generated in an analogous
way.)
In practice, the infinite sum in the expansion of d1 is truncated and the
‘observed’ series is generated as:
t−s−1 (d1 )
(1 − L)d1 yt = Ar yt−r (3.91)
r=0

where s is the integer part of (d1 + ½), which reﬂects the assumption that pre-
sample values of yt are zero. (DGM, op. cit., report that this initialisation,
compared to the alternative that the sequence {yt }t=0
−∞ exists, has only minor
effects on the power of the test; see also Marinucci and Robinson, 1999, for
discussion of the general point.)

3.5.3 Limiting null distributions

The distributions of γ̂(d) and tγ (d) depend on the value of d = d1 and whether
d1 is known, or estimated as in the composite alternative d < d0 . Since the more
usual case is the composite alternative, a summary of the distributions in the
case of known d1 is contained in Appendix 3.2.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 107 — #32

108 Unit Root Tests in Time Series Volume 2

For the case of interest here, there are two key results as follows that inform
subsequent sections. Let the data be generated by a random walk with iid errors:

yt = εt (3.92)
εt ∼ iid(0, σε2 ) E|ε4t | < ∞

1. In the ﬁrst case, note that for d1 ∈ [0.5, 1.0):

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
tγ (d1 ) ⇒D N(0, 1) (3.93)

See DGM, op. cit., theorem 2, (25). Testing H0 : d = d0 = 1 against HA : d < d0

with tγ (d1 ) uses left-sided critical values and rejection follows from ‘large’
negative values of tγ (d1 ).
2. The second case concerns the typical case in practice, where the ‘input’ value
d1 = d̂T , and d̂T is a consistent estimator of d, with convergence rate Tκ , κ > 0;
if d̂T ≥ 1, then it is trimmed to 1−c where c ≈ 0, but satisﬁes c > 0, for example
DGM select c = 0.02 in their simulation experiments. Then, it is also the case
that the limiting distribution is standard normal:

tγ (d̂T ) ⇒D N(0, 1) (3.94)

see DGM, op. cit., theorem 5. Thus provided that the range of d̂T is restricted
(although not unduly so), standard N(0, 1) testing procedures apply.

3.5.4 Serially correlated errors: an augmented FDF, AFDF

For the case of serially correlated errors, DGM suggested an augmented FDF
regression along the lines of an ADF regression. First, consider a development
analogous to the simple case, but here for yt ∼ FI(d) with an AR(p) error, that is:
p
ϕ(L)d yt = εt ϕ(L) = 1 − ϕj Lj
j=1

ϕ(L)yt = 1−d εt
= εt + (d − 1)εt−1 + 0.5d(d − 1)εt−2 + · · ·

= (d − 1)ϕ(L)d yt−1 + zt (3.95)

zt = εt + 0.5d(d − 1)εt−2 + · · · (3.96)

The second line of (3.95) uses the binomial expansion of 1−d ≡ (1 − L)1−d ;
see Equations (3.3) to (3.5). The third line of (3.95) uses ϕ(L)d yt−1 =
εt−1 . Substituting for ϕ(L) in (3.95) and using the BN decomposition

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 108 — #33

Fractional Integration 109

ϕ(L) = ϕ(1) + ϕ∗ (L) results in:

ϕ(L)yt = (d − 1)[ϕ(1) + ϕ∗ (L)]d yt−1 + zt

⇒

yt = (d − 1)ϕ(1)d yt−1 + [1 − ϕ(L)]yt + zt∗ (3.97a)

= γ d
yt−1 + [1 − ϕ(L)]yt + zt∗ (3.97b)
p

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
= γd yt−1 + ϕj yt−j + zt∗ (3.97c)
j=1

γ = (d − 1)ϕ(1) (3.97d)

zt∗ ∗
= zt + (d − 1)ϕ (L) 1+d
yt−1 (3.97e)

Note that (3.97a) follows on adding [1 − ϕ(L)]yt to both sides of the previous
p
equation, where 1 − ϕ(L) = j=1 ϕj Lj . Also note that if d = 1, then the terms in
(d−1) disappear and, therefore, ϕ(L)yt = εt ; if d = 0, then ϕ(L)yt = εt (see Q3.5);
also if ϕ(1) > 0, then γ becomes more negative as d increases. Note also that the
p
case with ϕ(1) = 1 − j=1 ϕj = 0 is ruled out because it implies a unit root (or
p
roots) in ϕ(L), and ϕ(1) > 0 implies the stability condition j=1 ϕj < 1. This, as
in the simple case, motivates the estimation of (3.97c) and use of the t-statistic
on γ̂. (However, as in the simple case, LS estimation is inefﬁcient because of the
serially correlated property of zt∗ ; see the EFDF test below.)
In the DGM procedure, d is (generally) replaced by d1 , and the regression to
be estimated, the AFDF(p) regression, is:
p
yt = γd1 yt−1 + ϕj yt−j + zt∗ (d1 ) (3.98)
j=1

The suggested test statistic is the t statistic on γ and, in practice, d1 = d̂T in

(3.98). The critical values are left-sided (which assumes that ϕ(1) > 0). Under the
null hypothesis that yt is generated by an ARIMA(p, 1, 0) process, the limiting
distributions of the AFDF t-type statistics, tγ (d1 ) and tγ (d̂T ), are the same as their
counterparts in the iid case, (DGM, op. cit., theorem 6); see (3.93) and (3.94)
respectively, and see also Appendix 3.2, Table A3.1 for known d = d1 .
So far p has been assumed known, whereas the more likely case is that the
correct lag length is unknown and k, rather than p, is the selected lag length. In
the context of the standard ADF(k) τ̂-type test, a key result due to Said and Dickey
(1984), states that if the errors follow a stationary invertible ARMA process,
then a sufﬁcient condition to obtain the same limiting null distribution and
the consistency of the pseudo t test is that k → ∞ as T → ∞, such that k3 /T → 0.
DGM (op. cit., theorem 7) show that this result also applies to the AFDF(k)
regression. For example, as to consistency, let the DGP be: d yt = ut 1(t>0) , d ∈
[0, 1) and ut = φ(L)−1 θ(L)εt is a stationary process; if k → ∞ as T → ∞, such that
k 3 /T → 0, then the t-type test on γ̂, tγ (d1 ) → −∞ and is, therefore, consistent.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 109 — #34

110 Unit Root Tests in Time Series Volume 2

Replacement of d1 with d̂T , where d̂T is a consistent estimator, does not alter the
consistency of the test. We also discuss later (see section 3.8.3) an alternative
formulation of the augmented FDF test, the pre-whitened AFDF, due to Lobato
and Velasco (2006), which considers the optimal choice of d1 .

3.5.5 An efﬁcient FDF test

LV have suggested a simple modiﬁcation of the DGM framework, which leads
to a more efﬁcient test. To understand this development start with the simple

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
FI(d) process:

d yt = εt 1(t>0) (3.99)

The start-up condition εt 1(t>0) indicates that the process is initialized at the start
of the sample; (which, incidentally, identiﬁes the resulting fractional Brownian
motion as type II). Where the context is clear, this initialization will be left
implicit. The usual assumptions on εt are that it has ﬁnite fourth moment and
the sequence {εt } is iid.
Now note that by a slight rearrangement (3.99) may be written as follows:

d−1 yt = εt ⇒

yt = 1−d εt
= εt + (d − 1)εt−1 + 0.5d(d − 1)εt−2 + · · ·

= (d − 1)d yt−1 + zt (3.100)

The third line uses εt−1 = d yt−1 by one lag of Equation (3.99) and note that:

zt = εt + 0.5d(d − 1)εt−2 + · · · (3.101)

In effect, Equation (3.100) is just a more explicit statement of Equation (3.86)

due to DGM, with d0 = 1, d = d1 and zt spelled out by Equation (3.101); also
note that γ = (d − 1) = (d1 − 1) so that γ < 0 for d1 < 1, justifying the use of the
t statistic tγ (d1 ), or t̂γ (d̂T ), with ‘large’ negative values leading to rejection of
H0 : d = d0 = 1 against HA : d = d1 ∈ [0.5, 1) or d1 ∈ [0, 1). However, whilst zt is
orthogonal to d yt−1 it is serially correlated and so LS in the context of (3.100)
is a consistent but not an efﬁcient estimation method.
LV suggested the following way around this problem. Adding and subtracting
yt to Equation (3.99), and rearranging, gives:

yt ≡ yt − d yt + εt

≡ (1 − d−1 )yt + εt

≡ −(d−1 − 1)yt + εt (3.102)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 110 — #35

Fractional Integration 111

This is, so far, just a restatement of an FI(d) process, and though in that sense
trivial it maintains εt as the ‘error’, in contrast to the serially correlated zt in
Equation (3.100). It may be interpreted as a testing framework in the following
way. First, introduce the coefﬁcient π on the variable (d−1 − 1)yt , which then
enables a continuum of cases to be distinguished, including d = 0 and d = 1.
That is:

yt = π(d−1 − 1)yt + εt (3.103)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
If π = 0, then yt ≡ εt so that d = 1, whereas if π = −1 then d yt ≡ εt for d = 1.
Although the variable (d−1 − 1)yt appears to include yt , this is not the
case. Speciﬁcally from an application of Equation (3.3) note that d−1 yt =
t−1 (d−1) (d−1) (d−1)
r=0 Ar yt−r , where A0 ≡ 1, Ar ≡ (−1)r d−1 Cr , and, therefore:

(d−1)
(1 − d−1 )yt = yt − [1 − (d − 1)L + A2 L2 + · · · ]yt
t−1 (d−1)
= (d − 1)yt−1 − Ar Lr yt (3.104)
r=2

The development leading to (3.103), suggests that it would be possible to nest

the generating process, assuming that d is not known, in a suitable generali-
sation of (3.103). Consider replacing d by d1 = 1, and let d+ = d1 − d; then
+
multiply both sides of (3.99) through by d , so that:

+
d1 yt = d εt (3.105)

Then following the steps leading to (3.103) results in:

yt = π(d1 −1 − 1)yt + vt (3.106)

d+
vt = εt

If d1 = d, then vt = εt , otherwise vt = εt and is serially correlated. This differs

from (3.95) and (3.96) because zt is serially correlated even for the correct value
of d, whereas this is not the case for vt .
A test of H0 : d = d0 = 1 against HA : d < 1 could be obtained by ﬁrst construct-
ing (d1 −1 − 1)yt using the value of d1 in (3.106), with a suitable truncation
to the beginning of the sample, and then regressing yt on (d1 −1 − 1)yt and
using a left-sided t test. In this formulation, the use of d1 = 1 is ruled out because
(0 − 1)yt = 0; and the coefﬁcients in (3.106) using d1 , for example, (d1 − 1),
change sign for d1 > 1, so that such a left-sided test is not appropriate for d1 > 1.
To avoid the discontinuity at d1 = 1, LV suggest scaling the regressor by (1 − d1 ),

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 111 — #36

112 Unit Root Tests in Time Series Volume 2

so that the regression model is as follows:

yt = ηzt−1 (d1 ) + vt (3.107)

(d1 −1 − 1)
zt−1 (d1 ) ≡ yt (3.108)
(1 − d1 )
η = π(1 − d1 ) (3.109)
+

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
vt = d εt (3.110)

This set-up, therefore, differs from that of DGM because the suggested regressor
is zt−1 (d1 ) and not d1 yt−1 . The problem as d1 → 1 is solved because apply-

ing L’Hôpital’s rule results in (d1 −1 − 1)/(1 − d1 ) → − ln(1 − L) = ∞ −1 j
j=1 j L .
Considering this limiting case, the regression model becomes:

∞
yt = η j−1 yt−j + vt (3.111)
j=1

In this form, the resulting t test based on η is asymptotically equivalent to

Robinson’s (1991) LM test formulated in the time domain (see also Tanaka,
1999).
Considering (3.107), if d1 = d, then η = π(1 − d) = (d − 1). Thus, a test of
a unit root null against a fractionally integrated alternative can be based on
the departure of η from 0, becoming more negative as d differs from 1, which is
the value under the unit root null. If d = 1, as under H0 , then η = 0 whatever the
value d1 and, hence, yt = εt ; whereas η = (d − 1) < 0 under HA and d yt = εt ,
thus yt ∼ FI(d).
The proposed test statistic is analogous to the FDF set-up, the t test on η,
denoted tη (d1 ) for the simple alternative or tη (d̂T ) for the composite alternative
with input d1 = d̂T . Note that in practice there is no constraint to choose d1
in the EFDF test to be the same as that in the FDF test, although that is one
possibility. Large negative values of the test statistic lead to rejection of H0 : d =
d0 = 1 against the left-sided alternative (either simple or composite). This test
will be referred to as the efﬁcient FDF (EFDF) test.

3.5.6 EFDF: limiting null distribution

The limiting null distribution for tη (d̂T ) is as follows. Let the data be generated
as in (3.99), under the null hypothesis d = 1, then:

tη (d̂T ) ⇒D N(0, 1) (3.112)

d̂T = d1 + op (T−κ ), d̂T > 0.5, d1 > 0.5, κ > 0, d̂T = 1 (3.113)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 112 — #37

Fractional Integration 113

See LV, theorem 1, who show that the test statistic(s) is, again, normally
distributed in large samples.

3.5.7 Serially correlated errors: an augmented EFDF, AEFDF

The more realistic DGP rather than a simple FI(d) process is a form of the ARFIMA
model. LV consider the ARFIMA(p, d, 0) case, in which the model is as follows:
d yt = zt

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ϕ(L)zt = εt 1(t>0)
⇒

ϕ(L)d yt = εt 1(t>0) (3.114)

where ϕ(L) = 1 − ϕ1 L − · · · − ϕp Lp ,
with all roots outside the unit circle. Pro-
ceeding as in the simple case, but here adding and subtracting ϕ(L)d0 yt and
rearranging, results in:
d0 yt = ϕ(L)(1 − d−d0 )d0 yt + [1 − ϕ(L)]d0 yt + εt (3.115)
If d0 = 1, then this becomes:
yt = ϕ(L)(1 − d−1 )yt + [1 − ϕ(L)]yt + εt (3.116)
As in the simple case, introduce the coefficient π such that if π = 0, then
ϕ(L)yt = εt , and if π = −1, then ϕ(L)d yt = εt , so that:
yt = π[ϕ(L)(d−1 − 1)]yt + [1 − ϕ(L)]yt + εt (3.117)
Next proceed along lines identical to Equation (3.107), but using the scaled
regressors ϕ(L)zt−1 (d1 ) rather than ϕ(L)(d1 −1 − 1)yt , so that:
yt = η[ϕ(L)zt−1 (d1 )] + [1 − ϕ(L)]yt + vt (3.118)
As in the simple case, the proposed test statistic is the t test associated with η,
tη (d1 ), or using an estimated value d̂T , tη (d̂T ), which will be referred to as the
efficient augmented FDF (EAFDF) test.
The regression (3.118) is nonlinear in the coefficients because of the multi-
plicative form ηϕ(L), with the ϕ(L) coefficients also entering through the second
set of terms, although in the first set they are associated with d = d1 and in the
second set with d = d0 . There are several ways to deal with this. First, it is fairly
straightforward to estimate the nonlinear form; for example suppose that p = 1,
so that ϕ(L) = 1 − ϕ1 L and, therefore, 1 − ϕ(L) = ϕ1 L; and using a consistent
estimator d̂T of d, then:
yt = η(1 − ϕ1 L)zt−1 (d̂T ) + ϕ1 Lyt + εt

= ηzt−1 (d̂) − ηϕ1 zt−2 (d̂T ) + ϕ1 yt−1 + εt

= α1 zt−1 (d̂T ) + α2 zt−2 (d̂T ) + α3 yt−1 + εt (3.119)

where α1 = −α2 /α3 .

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 113 — #38

114 Unit Root Tests in Time Series Volume 2

p
Alternatively, LV suggest ﬁrst estimating the coefﬁcients in 1−ϕ(L) = j=1 ϕj L
j

using d̂T , that is from the p-th order autoregression in d̂T yt :

p
d̂T yt = ϕj d̂T yt−j + ξt ﬁrst step (3.120)
j=1

Then use the estimates ϕ̂j of ϕj in the second stage to form η[ϕ̂(L)zt−1 (d̂T )]:
p
yt = η[ϕ̂(L)zt−1 (d̂T )] + ϕj yt−j + v̂t second step (3.121)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
j=1

(d̂T −1 − 1)
zt−1 (d̂T ) = yt (3.122)
(1 − d̂T )
Notice that the coefﬁcients ϕ̂j are not used in the second set of terms as, in this
case, they relate to the null value d0 = 1, with d0 yt−j = yt−j . As before, the test
statistic is the t test, tη (d̂T ), on the η coefﬁcient. Also DGM suggest a one-step
procedure, for details see DGM (2008, section 3.2.2 and 2009, section 12.6).

3.5.8 Limiting null distribution of tη (d̂T )

Let the data be generated by ϕ(L)yt = εt with εt as speciﬁed in (3.99), where
ϕ(L) is a p-th order lag polynomial with all roots outside the unit circle, ϕ̂(L) is
estimated as in (3.120) (or DGM’s alternative one-step procedure or nonlinear
estimation) and d̂T is speciﬁed as in (3.113), then (under the same conditions
as for the simple case):

tη (d̂T ) ⇒D N(0, 1)

see LV (op. cit., theorem 2).

3.6 FDF and EFDF tests: deterministic components

Note that the DGP so far considered has assumed that there are no deterministic
components in the mean. In practice this is unlikely. As in the case of standard
DF tests, the leading speciﬁcation in this context is to allow the fractionally inte-
grated process to apply to the deviations from a deterministic mean function,
usually a polynomial function of time, as in Equation (3.42). For example, the
ARFIMA(p, d, q) model with mean or trend function µt is:

d [yt − µt )] = φ(L)−1 θ(L)εt (3.123)

θ(L)−1 φ(L)d
yt = εt (3.124)

yt ≡ [yt − µt )] (3.125)

As a special case the ARFIMA(0, d, 0) model is:

d
yt = εt (3.126)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 114 — #39

Fractional Integration 115

Thus, as in the standard unit root framework, the subsequent analysis is in terms
of observations that are interpreted as deviations from the mean (or detrended
observations), yt . As usual, the cases likely to be encountered in practice are
µ(t) = 0, µ(t) = y0 and µt = β0 + β1 t. The presence of a mean function has
implications for the generation of data for the regressor(s) in the FDF and EFDF
tests. It is now necessary to generate a regressor variable of the form d1 yt−1 ,

where yt = yt − µt , and application will require a consistent estimator µ̂t of µt
and, hence, yˆ t = yt − µ̂t .

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
The simplest case to consider is µt = y0 . Then the generating model is

yt = y0 + w(L)−d εt (3.127)

Notice that E(yt ) = y0 , given E(εt ) = 0, so the initial condition can be viewed
as the mean of the process. Let ds be the truncated (finite sample) operator
(defined below), then it does not follow that ds yt = ds yt except asymptotically
and given a condition on d. The implication is that adjusting the data may be
desirable to protect against finite sample effects (and is anyway essential in the
case of a trend function). The notation in this section will explicitly acknowledge
the finite sample nature of the d operator, a convention that was previously
left implicit.
The argument is as follows. First define the truncated binomial operator ds ,
which is appropriate for a finite sample, that is:
s (d)
ds = Ar Lr (3.128)
r=0

⇒
s (d) (d) (d)
ds yt = Ar yt−r = yt + A1 yt−1 + · · · + As yt−s (3.129)
r=0
(d)
and ds (1) = sr=0 Ar , that is ds evaluated at L = 1.
Next consider the deﬁnition yt ≡ yt − y0 , (the argument is relevant for y0
known), so that:

ds
yt = ds (yt − y0 )

= ds yt − ds y0 (3.130)

Hence, ds yt and ds yt differ by −ds y0 = −y0 ds (1), which is not generally 0 as
it is for d = 1, (that is the standard ﬁrst difference operator for which y0 = 0).
(d−1)
Using a result derived in Q3.5.a, ds (1) = As and, therefore,
(d−1)
ds
yt − ds yt = −As y0 (3.131)

Thus, ds yt = ds yt , differing by a term that depends on y0 ; as s → ∞, d−1s yt →

(d−1)
ds yt , provided that As y0 → 0. In the case of a zero initial value, y0 = 0, then
dsyt = ds yt , as expected. If y0 is ﬁxed or is a bounded random variable then

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 115 — #40

116 Unit Root Tests in Time Series Volume 2

(d−1) (d−1)
ds (1) = As → 0 is sufﬁcient for As y0 → 0 as s → ∞. Referring to Equation
(d−1)
(3.20), but with d − 1, then As → 0 if d > 0. If s = t − 1, as is usually the case,
(d−1)
then dt−1 yt − dt−1 yt = −At−1 y0 .
The result in (3.131) implies the following for s = t − 1 and the simple FI(d)
process ds yt = εt :

dt−1 yt = dt−1 y0 + εt
(d−1)
= At−1 y0 + εt

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
and for t = 2, . . . , T, this gives rise to the sequence:
(d−1) (d−1) (d−1)
d1 y2 = A1 y0 + ε2 , d2 y3 = A2 y0 + ε3 , . . . , dT−1 yT = AT−1 y0 + εt

(d−1)
Notice that even though y0 is a constant, terms of the form As y0 are not, as
they vary with s. Under the null hypothesis d = 1, all of these terms disappear,
but otherwise do not vanish for d ∈ [0, 1), becoming less important as d → 1
and as s → ∞. The implication of this argument is that demeaning yt using an
estimator of y0 that is consistent under the alternative should, in general, lead
to a test with better finite sample power properties.
The linear trend case is now simple to deal with. In this case µt = β0 + β1 t and
(d−1)
s
d yt = ds [yt − (β0 + β1 t)] = ds yt − ds (1)β0 − β1 ds t = ds yt − As β0 − β1 ds t,
(d−1)
and it is evident that s d yt = s yt . Even if the finite sample effect of As
d β0 is
ignored, the term due to the trend remains.
Shimotsu (2010) has suggested a method of demeaning or detrending that
depends on the value of d. The context of his method is semi-parametric esti-
mation of d, but it also has relevance in the present context. In the case that y0 is

unknown, two possible estimators of y0 are the sample mean ȳ = T−1 Tt=1 yt and
the first observation, y1 . ȳ is a good estimator of y0 when d < | 0.5 | , whereas y1 is
a good estimator of y0 when d > 0.75, and both are good estimators for d ∈ (0.5,
0.75); for further details see Chapter 4, Section 4.9.2.iv. Shimotsu (2010) sug-
gests the estimator y0 that weights ȳ and y1 , using the weighting function κ(d):

y0 = κ(d)ȳ + [1 − κ(d)]y1 (3.132)

The weighting function κ(d) should be twice differentiable, such that:

κ(d) = 1 for d ≤ ½, κ(d) ∈ (0, 1) for d ∈ (½, ¾); κ(d) = 0 for d ≥ ¾.

A possibility for κ(d) when d ∈ (½, ¾), is κ(d) = 0.5(1 + cos[4π(d − 0.5)]), which
weights y0 toward y1 as d increases. In practice, d is unknown and is replaced
by an estimator that is consistent for d ∈ (0, 1). For example, replacing d by d̂T
results in an estimated value of
y0 given by:

yˆ 0 = κ(d̂T )ȳ + [1 − κ(d̂T )]y1

(3.133)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 116 — #41

Fractional Integration 117

The demeaned data yˆ 0 are then used in the FDF and EFDF tests and d̂T
yt ≡ yt −
is the value of d required to operationalise the test (or d1 where an estimator is
not used).
Next consider the linear trend case, so that µt = β0 + β1 t (note that β0 ≡ y0 ).
The LS residuals are:

ŷt = yt − (β̂0 + β̂1 t) (3.134)

where ^ above indicates the coefﬁcient from a LS regression of yt on (1, t). As

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ŷ¯ = 0 from the properties of LS regression, the weighting function applied to the
residuals ŷt simpliﬁes relative to κ(d̂T ) of Equation (3.113); thus, let

yˆ 0 = [1 − κ(d̂T )]ŷ1
(3.135)

The data is then transformed as yˆ 0 .

yt = ŷt −
DGM (2008) suggest a slightly different approach, which provides an estima-
tor of the parameters of µt , which is consistent under H0 : d = 1 and HA : d < 1
at different rates (see also Breitung and Hassler, 2002). Imposing the unit root,
then:

yt = µt + εt (3.136)

which is the regression of yt on µt , where the order of µt is assumed to be

known. For example, suppose that µt = β0 + β1 t, then:

yt = β1 + εt (3.137)

and the LS estimator of β1 is β̂1 = (T − 1)−1 Tt=2 yt . Under H0 , β̂1 is consistent
at rate T1/2 and under HA it is consistent at rate T(3/2)−d (see DGM, 2008). The
adjusted observations are obtained as:

yˆ t = yt − β̂1
(3.138)

The revised regressions for the FDF and EFDF tests (in the simplest cases) using
adjusted data are as follows, where yˆ t generically indicates estimated mean or
trend adjusted data as the case requires:
FDF

yˆ t = γt−1
d̂T ˆ

yt−1 + zt (3.139)

EFDF

yˆ t = ηzt−1 (d̂T ) + vt
(3.140)
d̂ −1
(t−1
T
− 1)
zt−1 (d̂T ) = yˆ t
(3.141)
(1 − d̂T )

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 117 — #42

118 Unit Root Tests in Time Series Volume 2

As before the test statistics are the t-type statistics on γ and η, respectively,
considered as functions of d̂T .
In the case of the DF tests, the limiting null distributions depend on µt , leading
to, for example, the test statistics τ̂, τ̂µ and τ̂β ; however, that is not the case for
the DGM, LV and LM tests considered here. Provided that the trend function is
replaced by a consistent estimator, say µ̂t , then the limiting distributions remain
the same as the case with µt = 0.

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.7 Locally best invariant (LBI) tests

This section outlines a locally best invariant (LBI) test due to Tanaka (1999)
and Robinson (1991, 1994a), the former in the time domain and the latter in
the frequency domain. This test is also asymptotically uniformly most powerful
invariant, UMPI. The test, referred to generically as an LM test, is particularly
simple to construct in the case of no short-run dependence in the error; and
whilst it is somewhat more complex in the presence of short-run dynamics, a
development due to Agiakloglou and Newbold (1994) and Breitung and Hassler
(2002) leads to a test statistic that can be readily computed from a regression
that is analogous to the ADF regression in the standard case.
An advantage of an LM-type test is that it only requires estimation under the
null hypothesis, which is particularly attractive in this case as the null hypoth-
esis leads to the simplification that the estimated model uses first differenced
data. This contrasts with test statistics that are based on the Wald principle, such
as the DGM and LV versions of the FDF test, which also require specification
and, practically, estimation of the value of d under the alternative hypothesis.
Tanaka’s results hold for any d; specifically, for typical economic time series
where d ≥ 0.5, and usually d ∈ [0.5, 1.5), so that d is in the nonstationary
region. The asymptotic results do not require Gaussianity.

3.7.1 An LM-type test

3.7.1.i No short-run dynamics
Under an alternative local to d, the data is generated by:

(1 − L)(d+c) yt = εt (3.142)

The case emphasized so far has been d = d0 = 1, that is, there is a unit root, and a
test of the null hypothesis H0 : c = 0 against the alternative HA : c < 0 is, in effect, a
test of the unit root null against the left-sided alternative, with local alternatives
√
of the form c = δ/ T. However, other cases can be treated within this framework,
for example H0 : d = d0 = ½, so that the process is borderline nonstationary;
against HA : d0 + c, with c < 0, implying that the generating process is stationary;
and again with d0 = 1 but HA : d0 + c, c > 0, so that the alternative is explosive.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 118 — #43

Fractional Integration 119

There is a simple way to compute an asymptotic equivalent to the LM test,

which does not require estimation of the alternative; however, to understand
the basis of these tests it is helpful to consider its underlying principle. First,
allow for the data to be generated by a FI(d + c) process, that is:

(1 − L)(d+c) yt = εt 1t>0 εt ∼ N(0, σε2 < ∞) (3.143)

Hereafter, the initialization εt 1t>0 will be left implicit; and note that normality

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
is not required for the asymptotic results to hold and can be replaced with an
iid assumption. The (conditional) log-likelihood function is:

T 1 T
LL({yt }; c, σε2 ) = − log(2πσ2ε ) − 2 [(1 − L)d+c yt ]2 (3.144)
2 2σε t=1

Let xt ≡ (1 − L)d yt , then the ﬁrst derivative of LL(.) with respect to c (the score),
evaluated at c = 0, is:

∂LL({yt }; c, σε2 )
ST1 ≡ |H0 : c = 0 , σε2 = σ̂ε2
∂c

1 T t−1 −1
= 2 xt j xt−j
σ̂ε t=2 j=1
T−1
=T j−1 ρ̂(j) (3.145)
j=1
T
t=j+1 xt xt−j
ρ̂(j) = T (3.146)
2
t=1 xt
T
σ̂ε2 = T−1 d y2t
t=2
T
= T−1 x2
t=2 t

ρ̂(j) is the j-th autocorrelation coefficient of {xt }; more generally ρ̂(j) is the j-th
autocorrelation coefficient of the residuals (obtained under H0 ) and σ̂ε2 is an
estimator of σε2 .
In the case of the unit root null, H0 : d = d0 = 1, hence xt = yt and, therefore,
ρ̂(j) simplifies to:

T
t=j+1 yt yt−j
ρ̂(j) = T (3.147)
t=1 yt
2

Thus, ρ̂(j) is the j-th autocorrelation coefﬁcient of yt .

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 119 — #44

120 Unit Root Tests in Time Series Volume 2

At this stage it is convenient to deﬁne the following:

t−1
∗
xt−1 ≡ j−1 xt−j (3.148)
j=1

and write (3.145) as

1 T ∗
ST1 = xt xt−1 (3.149)
σ̂ε2 t=2

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(See Q3.7) Apart from the scale factor σ̂ε−2 , this is the numerator of the
least squares estimator of the coefﬁcient in the regression of xt on xt−1 ∗ , an

interpretation that will be useful below.

A suitably scaled version of ST1 provides the LM-type test statistic. First note
that:
ST1 √ T−1 −1
√ = T j ρ̂(j)
T j=1

⇒D N((π2 /6)δ, (π2 /6)) (3.150)

√
where c = δ/ T for δ ﬁxed (see Tanaka, 1999, theorem 3.1). Hence, dividing
√
ST1 / T by (π2 /6) gives an LM-type statistic with a limiting standard normal
distribution:
√
ST1 T T−1 −1
LM0 ≡ √ = j ρ̂(j)
T (π /6)
2 (π /6)
2 j=1

⇒D N δ π 6, 1
2
(3.151)

Hence, under the null δ = 0 and LM0 ⇒D N(0, 1), leading to the standard decision
rule that, with asymptotic size α, reject H0 : c = 0 against HA : c < 0 if LM0 < zα ,
and reject H0 against HA : c > 0 if LM0 > z1−α , where zα and z1−α are the lower
and upper quantiles of the standard normal distribution. Alternatively for a
two-sided alternative take the square of LM0 and compare this with the critical
values from χ2 (1), rejecting for ‘large’ values of the test statistic.
The LM0 test is locally best invariant, LBI, and asymptotically uniformly
most powerful invariant, UMPI. For a detailed explanation of these terms, see
Hatanaka (1996, especially section 3.4.1). In summary:

locally: the alternatives are local to d0 , that is d0 + c, where c > 0 (c < 0), but c ≈
√
0; the ‘local’ parameterization here is c = δ/ T.

best: considering the power function of the test statistic at d0 and d0 + c, the
critical region for a size α test is chosen such that the slope of the power function
is steepest.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 120 — #45

Fractional Integration 121

invariant: the test is invariant to linear transformations of {yt }, that is to changes

in location (translation) and scale.

asymptotically: as T → ∞.

uniformly most powerful: consider testing H0 : d0 against the composite alternative

HA : d0 + c, c > 0 (c < 0); a test that maximizes power (for a given size) for all
(permissible) values of c is said to be uniformly most powerful.

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Tanaka (1999, theorem 3.2 and corollary 3.2) shows that asymptotically the
power of LM0 coincides with the power envelope of the locally best invariant
test.

3.7.1.ii Deterministic components

We now consider the general case and show how it specialises to other cases of
interest. Let y be determined by Z, a T x K matrix of nonstochastic variables,
and u a T x 1 vector of errors generated as FI(−d), such that:

yt = Zt β + ut t = 1, . . . , T (3.152)
y = Zβ + u (3.153)

where y = (y1 , . . . , yT ) , Z = (z1 , . . . , zK ), zi = (zi1 , . . . , ziT ) , Zt = (z1t , . . . , zKt ), u =
−d ε, u = (u1 , . . . , uT ) , ε = (ε1 , . . . , εT ) , where {εt } is a white noise sequence. On
multiplying through by d , the model can be written as:

x =
Zβ + ε (3.154)

x ≡ d y (as before), Z ≡ d Z, on the understanding that the operator d is

applied to every element in the matrix or vector that it precedes. The conditional
ML (and LS) estimator of β is β̂ = ( Z
Z)−1
Z x and σ̂ε2 = ε̂ ε̂/T, where ε̂ = x −
Zβ̂ =

(y − Zβ̂) and ε̂t = xt −
d Zt β̂.
The likelihood and score are:

T 1 T
LL({yt }; c, σε2 ) = − log(2πσ2ε ) − 2 [(1 − L)d+c (yt − Zt β)]2
2 2σε t=1

∂LL({yt }; c, σε2 )
ST1 ≡ |H0 : c = 0 , σε2 = σ̂ε2
∂c
1 T
=− [ln(1 − L)(1 − L)d (yt − Zt β̂)](1 − L)d (yt − Zt β̂)
σ̂ε2 t=1

1 T
=− 2 [ln(1 − L)ε̂t ]ε̂t
σ̂ε t=1
T−1
=T j−1 ρ̂(j) (3.155)
j=1

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 121 — #46

122 Unit Root Tests in Time Series Volume 2

where ρ̂(j) = Tt=j+1 ε̂t ε̂t−j / Tt=1 ε̂2t is the j-th autocorrelation coefﬁcient of the
residuals ε̂; (the derivation of the last line is the subject of an end of chapter
question). H0 : c = 0 is rejected against c > 0 for large values of ST1 and rejected
against c < 0 for small values of ST1 . As noted above, a suitably scaled version
of ST1 has a standard normal distribution, which then provides the appropriate
quantiles.
We can now consider some special cases. The ﬁrst case of no determinis-
tics has already been considered, in that case Z is empty and y is generated

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
as d y = ε. In this case, the value of d is given under the null and no param-
eters are estimated. The test statistic is based on x = d y, so that for d = 1,
x = y = (y2 − y1 , . . . , y2 − yT−1 ). It is also important to consider how to deal
with the presence of a linear trend (the test statistics are invariant to y0 = 0).
Suppose the generating process is:

yt = β0 + β1 t + −d εt (3.156)
−d
= Zt β + εt

where Zt = (1, t) and β = (β0 , β1 ) . β̂ is obtained as β̂ = (
Z
Z)−1
Z x where, under the
unit root null d = 1, x = y and Z = Z = (z1 , z2 ), with z1 a column of 0s and
z2 a column of 1s; the ﬁrst element of β̂ is, therefore, annihilated and β̂1 is the
LS coefﬁcient from the regression of yt on a constant, equivalently the sample
mean of yt . The residuals on which the LM test is based are, therefore, ε̂t =

xt −
Zt β̂ = yt − β̂1 and the adjustment is, therefore, as described in Section 3.6.

3.7.1.iii Short-run dynamics

In the more general case the coefﬁcients of the AR and MA parts of an ARFIMA
model have to be estimated. We consider this extension brieﬂy in view of pre-
vious developments connected with the FDF test. In this case, data has been
generated by an ARFIMA (p, d0 + c, q) process, that is:

ϕ(L)(1 − L)(d0 +c) yt = θ(L)εt (3.157)

The (conditional) log-likelihood function is:

T 1 T
LL({yt }; c, ψ, σε2 ) = − log(2πσ2ε ) − 2 [θ(L)−1 ϕ(L)(1 − L)d0 +c yt ]2
2 2σε t=1

(3.158)

where ψ = (ϕ , θ ) collects the coefficients in ϕ(L) and θ(L). The score with respect
to c is as before, but with the residuals now defined to reflect estimation of the

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 122 — #47

Fractional Integration 123

coefﬁcients in ϕ(L) and θ(L):

∂LL({yt }; c, ψ, σε2 )
ST2 ≡ |H0 : c = 0 , σε2 = σ̂ε2 , ψ = ψ̂
∂c
T−1
=T j−1 ρ̂(j) (3.159)
j=1

where ρ̂(j) is the j-th order autocorrelation coefﬁcient of the residuals given by:
T

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
t=j+1 ε̂t ε̂t−j
ρ̂(j) = T (3.160)
t=1 ε̂t
2

ε̂t = θ̂(L)−1 ϕ̂(L)(1 − L)d0 yt (3.161)

Notice that the residuals are estimated using the null value d0 , hence d itself is
not a parameter that is estimated. Then, see Tanaka (op. cit., theorem 3.3):
ST2 √ T−1 −1
√ = T j ρ̂(j) (3.162)
T j=1
√
⇒D N(δω2 , ω2 ) c = δ/ T

Hence, the LM statistic, LM0 , is now:

√
T T−1 −1
LM0 = j ρ̂(j) (3.163)
ω j=1

⇒D N(δω, 1)

and if δ = 0, then

LM0 ⇒D N(0, 1)

The (long-run) variance, ω2 , in (3.162) is given by:

π2
ω2 = − (κ1 , . . . , κp , λ1 , . . . , λq )−1 (κ1 , . . . , κp , λ1 , . . . , λq ) (3.164)
6
The sequences {κi } and {λi } are given by:
∞ ∞
κi = j−1 gj−i λi = − j−1 hj−i (3.165)
j=1 j=1

where gj and hj are the coefﬁcients of Lj in the expansions of ϕ(L)−1 and θ(L)−1
respectively, and is the Fisher information matrix for ψ = (ϕ , θ ).
To operationalise (3.163) following the LM principle, a consistent estimate of
ω under the null denoted ω̂0 is substituted for ω, resulting in the feasible LM
test statistic, LM0,0 :
√
T T−1 −1
LM0,0 = j ρ̂(j) (3.166)
ω̂0 j=1

⇒D N(δω, 1)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 123 — #48

124 Unit Root Tests in Time Series Volume 2

A consistent estimator of ω can be obtained from (3.164) using the esti-

mates of ψ.

3.7.2 LM test: a regression approach

3.7.2.i No short-run dynamics
An easy way of computing a test statistic that is asymptotically equivalent to
the LM test statistic and using the same principle is to adopt the regression

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
approach due to Agiakloglou and Newbold (1994), as developed by Breitung and
Hassler (2002). First consider the simple case in which the data are generated as
(1 − L)d+c yt = εt and note that the structure of ST1 suggested the least squares
∗ , that is:
regression of xt on xt−1

∗
xt = αxt−1 + et (3.167)
d
xt ≡ yt (3.168)
t−1
∗
xt−1 ≡ j−1 xt−j (3.169)
j=1

The least squares estimator and associated test statistic for H0 : c = 0 against HA :
c = 0 translates to H0 : α = 0 and HA :α = 0. The LS estimator of α and the square
of the t statistic for α̂, are, respectively:
T ∗
t=2 xt xt−1
α̂ = T (3.170)
∗
t=2 (xt−1 )
2

2
T ∗
t=2 xt xt−1
tα2 = T (3.171)
σ̂e2 ∗
t=2 (xt−1 )
2

∗ .
where σ̂e2 = T−1 Tt=2 ê2t and êt = xt − α̂xt−1
Breitung and Hassler (2002, theorem 1) show that for data generated as
(1 − L)(d+c) yt = εt 1t>0 , with εt ∼ (0, σε2 < ∞), then:

tα2 = (LM0 )2 + op (1) (3.172)

⇒
tα2 ⇒D χ2 (1) (3.173)

So that tα2 is asymptotically equivalent to (LM0 )2 . Recall that if d0 = 1, then

xt = yt and, therefore, the regression (3.167) takes the following form:

yt = δy∗t−1 + υt (3.174)

y∗t−1 = (yt−1 + 1 1 1
2 yt−2 + 3 yt−3 + · · · + (t−1) (y1 − y0 )) (3.175)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 124 — #49

Fractional Integration 125

In contrast the corresponding DF regression is:

yt = γ(yt−1 + yt−2 + · · · + y1 − y0 ) + vt (3.176)

= γyt−1 + vt

and the last line assumes the initialization y0 = 0. Thus, the difference between
the two approaches is that the regressor y∗t−1 weights the lagged ﬁrst differences,
with weights that decline in the pattern (1, 21 , 13 , . . . , t−1
1
).

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Note that the test statistic is here presented in the form tα2 as HA : c = 0,
however, the more likely case is HA : c < 0, so the test statistic tα can be used
with left-sided critical values from N(0, 1).

3.7.2.ii Short-run dynamics

In the more likely case there will be short-run dependence introduced through
either or both the (non-redundant) lag polynomials ϕ(L) and θ(L). In this case,
the approach suggested by Agiakloglou and Newbold (1994) and Breitung and
Hassler (2002) follows the familiar route of augmenting the basic regression.
First, start with short-run dynamics captured by an AR(p) component, so that
data is generated by an ARFIMA(p, d, 0) model:

ϕ(L)(1 − L)d yt = εt (3.177)

ϕ(L)xt = εt (3.178)

xt ≡ (1 − L)d yt

Then under the null d = d0 and xt ≡ (1 − L)d0 yt can be formed. The resulting
p-th order autoregression in xt is:

xt = ϕ1 xt + · · · + ϕp xt−p + εt (3.179)

Estimation results in the residuals and regressor given by:

ε̂t = xt − (ϕ̂1 xt + · · · + ϕ̂p xt−p ) (3.180)

t−(p+1)
ε̂∗t−1 = j−1 ε̂t−j (3.181)
j=1

The augmented test regression is then:

p
ε̂t = αε̂∗t−1 + ψi xt−i + υt (3.182)
i=1

Notice that the regressors xt−i are also included in the LM regression. (See
Wooldridge, 2011, for a discussion of this point for stochastic regressors.) The
suggested test statistics are tα or tα2 , which are asymptotically distributed as
N(0, 1) and χ2 (1) respectively.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 125 — #50

126 Unit Root Tests in Time Series Volume 2

In the case of the invertible ARFIMA (p, d, q) model, the generating process is:

θ(L)−1 ϕ(L)(1 − L)d yt = εt (3.183)

−1
A(L)xt = εt A(L) = θ(L) ϕ(L) (3.184)

xt ≡ (1 − L)d yt

Then, as before, under the null d = d0 and xt ≡ (1 − L)d0 yt . In this case A(L) is
an inﬁnite order lag polynomial and, practically, is approximated by truncating

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
the lag to a ﬁnite order, say k. A reasonable conjecture based on existing results
is that provided the order of the approximating lag polynomial expands at an
appropriate rate, then the limiting null distributions of tα and tα2 are maintained
as in the no serial correlation case. The suggested rate for the ADF(k) test is k
1
→ ∞ as T → ∞ such that k/T /3 → 0 (see also Ng and Perron (2001) and UR Vol. 1,
chapter 9).

3.7.2.iii Deterministic terms

By extension of the results for the standard LM test (see Section 3.7.1) the test
statistic, tα , under the unit root null uses differenced data and is, therefore,
invariant to a non-zero mean, µt = y0 = 0. In the case of a linear trend, the
adjusted data for the test is yt − β̂1 ; see Breitung and Hassler (2002), and Section
3.7.1.ii.

3.7.3 A Wald test based on ML estimation

√
Noting that ω−1 is the standard error of the limiting distribution of T(d̂ − d),
where d̂ is the ML estimator of d, then a further possibility is to construct a Wald-
type test based on estimation under the alternative (Tanaka, 1999, section 4).
Speciﬁcally we have the following limiting distribution under the null:
√
T(d̂ − d0 ) ⇒D N(0, ω−2 ) ⇒ (3.185)

(d̂ − d0 )
td̂ = √ (3.186)
(ω−1 / T)
√
= Tω(d̂ − d0 ) (3.187)
⇒D N(0, 1)
√
The ﬁrst statement, (3.185), says that T(d̂ − d) is asymptotically normally dis-
√
tributed with zero mean and variance ω−2 . The ‘asymptotic variance’ of Td̂
√
is ω−2 and the ‘asymptotic standard error’ is Tω−1 , therefore standardising
√
(d̂ − d) by ω−1 / T gives the t-type statistic, denoted td̂ in (3.186). This results in
a quantity that is asymptotically distributed as standard unit normal. In prac-
tice, one could use the estimator of ω−1 obtained from ML estimation to obtain
a feasible (ﬁnite sample) version of td̂ .

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 126 — #51

Fractional Integration 127

√
In the simplest case ω2 = (π2 /6), therefore ω−2 = (π2 /6)−1 = 6/π2 , T(d̂ −
√
d) ⇒D N(0, 6/π2 ) and the test statistic is td̂ = 1.2825 T(d̂ − d0 ) ⇒D N(0, 1). The
more general case is considered in a question (see Q3.8).
Tanaka (op. cit.) reports some simulation results to evaluate the Wald and
LM tests (in the forms LM0 and LM0,0 ) and compare them with the limiting
power envelope. The data-generating process is an ARFIMA(1, d, 0) model, with
niid input, T = 100, and 1,000 replications with a nominal size of 5%. The
results most likely to be of interest in an economic context are those around the

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
stationary/nonstationary boundary, with H0 : d0 = 0.5 against HA : dA > 0.5 and,
alternately, H0 : d0 = 1 against HA : dA < 1.
With a simple fractional integrated process, the simulated size is 5.7% for
LM0 and 3.9% for td̂ using ω2 = (π2 /6), and 3.9% and 3.5%, respectively, when
ω2 is estimated. In terms of size-unadjusted power, the LM test is better than
the Wald test, for example when d = 0.6, 36.2% and 30.7% respectively using
ω2 = (π2 /6), and 30% and 29.3% when ω2 is estimated. Given the difference in
nominal size, it may well be the case that there is little to choose between the
tests in terms of size-adjusted power.
In the case of the unit root test against a fractional alternative, ARFIMA(1, d,
0), Tanaka (op. cit.) reports the results with ϕ1 = 0.6 and, alternately, ϕ1 = −0.8;
we concentrate on the former as that corresponds to the more likely case of
positive serial correlation in the short-run dynamics. The simulation results
suggest a sample of T = 100 is not large enough to get close to the asymptotic
results. For example, the nominal sizes of LM0 and LM0,0 were 1.4% and 2.2%,
whereas the corresponding ﬁgures for the Wald test were 9.8% and 16.1%; these
differences in size suggest that a power comparison could be misleading.

3.8 Power

3.8.1 Power against ﬁxed alternatives

The LM, FDF and EFDF tests are consistent against fixed alternatives, where the
versions of the FDF test assume an estimator d1 that is consistent at rate Tκ ,
with κ > 0. DGM (2008, theorem 2) derive the non-centrality parameter asso-
ciated with each test under the fixed alternative d ∈ (0, 1) and d yt = εt and
they suggest an analysis of asymptotic power in terms of Bahadur’s asymptotic
relative efficiency, ARE. Consider two tests, say, t̂1 and t̂2 , with correspond-
ing non-centrality parameters nc1 and nc2 , then the ARE of the two tests as a
function of d is:

ARE(t̂1 , t̂2 , d) = (nc1 /nc2 )2 (3.188)

where ARE(t̂1 , t̂2 , d) > 1 implies that test t̂1 is asymptotically more powerful than
test t̂2 . ARE is a function of the ratio of the non-centrality parameters, so that t̂1
is asymptotically more powerful than t̂2 as |nc1 | > |nc2 | (because of the nature

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 127 — #52

128 Unit Root Tests in Time Series Volume 2

of HA , large negative values of the test statistic lead to rejection, thus nci < 0).
DGM (2008, ﬁgures 1 and 2) show that the non-centrality parameters of tγ (d1 ),
tη (d1 ) and LM0 , say ncγ , ncη and nc0 , are similar for d ∈ (0.9, 1), but for d ∈ (0.3,
0.9) approximately, |ncη | > |ncγ | > |nc0 |, whereas for d ∈ (0.0, 0.3), |ncη | ≈ |ncγ |
> |nc0 |. In turn these imply that in terms of asymptotic power tη (d1 ) ≈ tγ (d1 )
≈ LM0 for d ∈ (0.9, 1); tη (d1 ) > tγ (d1 ) > LM0 for d ∈ (0.3, 0.9); and tη (d1 ) ≈
tγ (d1 ) > LM0 for d ∈ (0.0, 0.3). Perhaps of note is that there is little to choose
(asymptotically) between the tests close to the unit root for ﬁxed alternatives.

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.8.2 Power against local alternatives
When the null hypothesis is that of a (single) unit root, so that yt = εt , then
the local alternative is:

d yt = εt (3.189)
√
d = 1 + δ/ T δ<0 (3.190)

where εt is iid with ﬁnite fourth moment. This is as in the LM test, with d0 = 1.
As T → ∞ then d → 1, which is the value under H0 , so that there is a sequence
of alternative distributions that get closer and closer to that under H0 as T → ∞.
The test statistic tη (d̂T ) is asymptotically equivalent to the Robinson/Tanaka LM
test which is, as noted above, UMPI under a sequence of local alternatives. To
see this equivalence it is helpful to consider the distributions of the test statistics
under the local alternatives.
Let the DGP be d yt = εt 1(t>0) , then under the sequence of local alternatives
√
d = 1 + δ/ T, δ < 0, d1 ≥ 0.5, the limiting distributions are as follows:

LM0 ⇒D N(δh0 , 1)
∞
h0 = j−2 = π2 /6 = 1.2825 (3.191)
j=1

FDF
√
tγ (d) ⇒D N(λ1 , 1) d = 1 + δ/ T
λ1 = δh1 , h1 = 1 (3.192)
tγ (d1 ) ⇒D N(λ2 (d1 ), 1)
λ2 (d1 ) = δh2 (d1 )
(d1 )
h2 (d1 ) = d1 > 0.5 (3.193)
d1 (2d1 − 1)

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 128 — #53

Fractional Integration 129

EFDF

tη (d̂T ) ⇒D N(λ3 (d2 ), 1)

λ3 (d2 ) = δh3 (d2 )
∞ −1 (d2 −1)
j=1 j Aj
h3 (d2 ) = d2 > 0.5, d2 = 1 (3.194)
∞ (d2 −1) 2
j=1 (Aj )

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
d̂T = d2 + op (T−κ ), κ > 0, d̂T > 0.5

(d)
The Aj coefficients are defined in Equations (3.4) and (3.5); tγ (d) is computed
√
from the regression of yt on d yt−1 , d = 1 + δ/ T; whereas tγ (d1 ) is computed
from the regression of yt on d1 yt−1 ; tη (d̂T ) is computed from the regression
of yt on zt−1 (d̂T ) and LM0 is given by (3.151). The input values d1 and d2 are
distinguished as they need not be the same in the two tests.
The effect of the local alternative on the distributions of the test statistics is
to shift each distribution by a non-centrality parameter λi , which depends on
δ and a drift function denoted, respectively, as h0 , h1 , h2 (d1 ) and h3 (d2 ); note
that the first two are just constants. In each case, the larger the shift in λi (dj )
relative to the null value, δ = 0, the higher the power of the test.
The drift functions are plotted in Figure 3.9. Note that h2 (d1 ) and h3 (d2 ) are
everywhere below h0 , but that h3 (d2 ) → h0 as d → 1, confirming the asymptotic
local equivalence of tη (d̂T ) and LM0 , given a consistent estimator of d for the
former. The drift functions of tγ (d1 ) and tη (d̂T ) cross over at d = 0.77, indicating
that tη (d̂T ) is not everywhere more powerful than tγ (d1 ), with the former more
powerful for d ∈ (0.5, 0.77), but tη (d̂T ) more powerful closer to the unit root.
Note also that there is a unique maximum value of d1 for h2 (d1 ), say d1∗ ,
which occurs at d1∗ ≈ 0.69145 and, correspondingly, h2 (d1∗ ) = 1.2456, at which
the power ratio of tγ (d1 ) relative to LM0 is 97%. What is also clear is that d1 = 0.5
is a particularly poor choice as an input for t̂γ (d1 ) as λ1 = 0. Comparing tγ (d)
and tγ (d1∗ ), the squared ratio of the relative noncentrality parameters, which
gives the asymptotic relative efficiency is (1/1.2456)2 = 0.6445; note that the
power of tγ (d1 ) is greater than that of tγ (d) for d1 ∈ (0.56, 1.0).

3.8.3 The optimal (power maximising) choice of d1 in the FDF test(s)

Lobato and Velasco (2006), hereafter LV (2006), consider choosing d1 optimally,
where optimality refers to maximising the power of tγ (d1 ). In the case of local
alternatives, it has already been observed that there is a unique maximum to
the drift function h2 (d1 ), where d1∗ ≈ 0.69145 (in the white noise case).
In the case of ﬁxed alternatives d ∈ (0.5, 1.0), the optimal value of d1 , the input
value of the test, follows from maximising the squared population correlation
between yt on d1 yt−1 . In general, the value that maximises this correlation,

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 129 — #54

130 Unit Root Tests in Time Series Volume 2

1.4
upper limit h0

1.2 h2
h3

1
h1

0.8

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
h(d1)

0.6

cross over point

0.4

0.2

0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
d

Figure 3.9 The drift functions for tγ , tη and LM0

d1∗ , is a function of d and is written as d1∗ (d) to emphasise this point. LV (2006)
find that this relationship is in effect linear over the range of interest (d ≥ 0.5)
and can be fitted by d̂1∗ (d) = −0.031 + 0.719d, thus for a specific alternative,
d = d1 , the ‘best’ input value for the FDF regressor is d1∗ = −0.031 + 0.719d1 . In
the composite alternative case, d1 is estimated by a T κ -consistent estimator d̂T ,
with κ > 0, and the optimal selection of the input value is:

d̂1∗ (d̂T ) = −0.031 + 0.719d̂T (3.195)

To emphasise, note that in this procedure it is not the consistent estimator d̂T
that is used to construct the regressor, (nor is it d = d1 in the simple alternative),
but a simple linear transform of that value, which will be less than d̂T (or d1 ).
Using this value does not alter the limiting standard normal distribution (LV,
2006, lemma 1).
In the more general case with ϕ(L)d yt = εt 1(t>0) , the optimal d1 depends on
the order of ϕ(L) for local alternatives and the values of the ϕj coefficients for
fixed alternatives. LV (2006) suggest a prewhitening approach, in which the first
step is to estimate ϕ(L) from a p-th order autoregression using yt , that is under
p
the null d = 1, ϕ(L)yt = εt ⇒ yt = j=1 ϕ̂j yt−j + ε̂t , where ^ over denotes a
p
LS estimator; the second step is to form y̆t = yt − ŷt , where ŷt = j=1 ϕ̂j yt−j ; y̆t is

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 130 — #55

Fractional Integration 131

Table 3.3 Optimal d1 , d1∗ , for use in PAFDF t̂γ (d1 ) tests (for local
alternatives)

p 0 1 2 3 4 5

d1∗ 0.691 0.846 0.901 0.927 0.942 0.951

Source: LV (2006, table 1).

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
referred to as the prewhitened data. The ﬁnal step is to run an AFDF regression,
but with y̆t rather than yt , that is:
p
y̆t = γd1 y̆t−1 + ϕj yt−j + ut∗ (d1 ) (3.196)
j=1

(Note that this differs from DGM’s, 2002, ADF in that it uses y̆t rather than
yt ; also note that yt may be subject to a prior adjustment for deterministic
components.) As before, the test statistic is the t-type test associated with γ,
tγ (d1 ), and the inclusion of lags of yt control, the size of the test. The revised
AFDF regression is referred to as the prewhitened AFDF, PAFDF.
In the case of a sequence of local alternatives, the optimal d1 depends on the
order p of ϕ(L), but not the value of ϕj coefficients themselves, so that d1∗ ≈
0.69145 can be seen to correspond to the special case where p = 0, as shown in
Table 3.3, for p ≤ 5.
In the case of fixed alternatives, there is no single function such as (3.195)
but a function that varies with p and the values of ϕj . LV (2006, op. cit., section
4), to which the reader is referred for details, suggest an algorithm that leads
to the automatic selection d1∗ ; they note that the method is computationally
intensive, but can offer power improvements in finite samples.

3.8.4 Illustrative simulation results

This section is concerned with a brief evaluation of the power of the various
competing tests outlined earlier against ﬁxed alternatives. It is not intended to
be comprehensive in the sense of a full evaluation for all possible DGPs, rather
to illustrate some of the issues that may guide the application of such tests. The
feasible versions of the FDF and EFDF tests require either a single input value
d1 or an input based on a consistent estimator d̂T of d; in the case of the FDF
test, with no weak dependence in the errors, there are three possibilities, namely
d1∗ = 0.69145, d̂1∗ (d̂T ) = −0.031 + 0.719d̂T and d̂T without adjustment, whereas
the EFDF test uses d̂T . As discussed extensively in Chapter 4, the performance
of ‘local’ semi-parametric frequency estimators is dependent on the number of
included frequencies designated m (for bandwidth), and the simulations use
m = T0.65 .
Two estimators, d̂T , were considered. These are the widely used semi-
parametric estimator due to Robinson (1995b), based on the local Whittle

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 131 — #56

132 Unit Root Tests in Time Series Volume 2

(LW) contrast, and the exact LW estimator of Shimotsu and Phillips (2005) and
Shimotsu (2010), referred to as d̂LW and d̂ELW , respectively (see Chapter 4). In
the case of d̂LW , the data are differenced and one is added back to the estimator,
whereas this is not necessary for d̂ELW , which is consistent for d in the nonsta-
tionary region. The FDF tests were relatively insensitive to the choice of d̂T , so
a comparison is only made for the EFDF tests.
The tests to be considered are:
FDF: tγ (d1∗ ) where d1∗ = 0.69145; tγ (d̂1∗ ) based on d̂LW ; and tγ (d̂LW ).

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
EFDF: tη (d̂LW ) and tη (d̂ELW ).
LM: Tanaka’s LM0 and the Breitung-Hassler regression version tα .
The DGPs are variations of the following:

[yt − µt ] = ut (3.197)

d ut = εt 1(t>0) d ∈ (0.5, 1] (3.198)

yˆ t ≡ yt − µ̂t
(3.199)

The first case to be considered is when µt = y0 and the estimators are as described
in Section 3.6. The simulations are with one exception invariant to the value
of y0 . The exception arises when using the test based on d̂ELW . To see why,
compare this method with that based on d̂LW ; in the latter case, the data are
first differenced and then 1 is added back to the estimator, thus, like the LM
tests, invariance to a non-zero initial value is achieved by differencing. In the
case of d̂ELW , the estimator is asymptotically invariant to a non-zero initial value
for d ≥ 0.5, but Shimotsu (2010) recommends demeaning by subtracting y1 from
yt to protect against finite sample effects. Thus, if it is known that y0 = 0, there
would be no need to use the Shimotsu demeaned data; however, in general this
is not the case (and it is not the case here), so the data is demeaned both in the
estimation of d and in the construction of the tests. The results, based on 5,000
replications, are given in Table 3.4a for T = 200 and Table 3.4b for T = 500, and
illustrated in Figures 3.10a and 3.10b.
The central case is T = 200. The empirical sizes of tγ (d1∗ ), tγ (d̂LW ) and tα are
close to the nominal size of 5%, whereas tγ (d̂1∗ ), tη (d̂LW ) and tη (d̂ELW ) are slightly
oversized, whilst LM0 is undersized. Relative power differs across the parameter
space. In terms of size-adjusted power (given the differences in nominal size),
for d close to the unit root, say d ∈ (0.9, 1.0), tγ (d1∗ ) is best, but it loses a clear
advantage as d decreases, with a marginal advantage to tη (d̂LW ) and tα . The least
powerful of the tests is tγ (d̂LW ). Figure 3.10a shows how, with one exception, the
power of the tests clusters together, whereas the region d ∈ (0.9, 1.0) is shown
in Figure 3.10b.
The results for T = 500 (see Table 3.4b for a summary), show that the tests, with
the exception of tγ (d̂LW ), which has the lowest power, are now quite difficult to

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 132 — #57

Fractional Integration 133

Table 3.4a Power of various tests for fractional d, (demeaned data), T = 200

d tγ (d1∗ ) tγ (d̂LW ) tγ (d̂1∗ ) tη (d̂LW ) tη (d̂ELW ) LM0 tα

0.60 1.000 1.000 1.000 1.000 1.000 1.000 1.000

0.65 1.000 1.000 1.000 1.000 1.000 0.999 1.000
0.70 0.999 0.998 1.000 1.000 1.000 0.999 0.999
0.75 0.991 0.979 0.999 0.999 0.999 0.991 0.999
0.80 0.934 0.896 0.971 0.971 0.970 0.920 0.960
0.85 0.777 0.698 0.856 0.846 0.849 0.728 0.820

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.90 0.491 0.424 0.587 0.546 0.548 0.403 0.511
0.95 0.210 0.195 0.268 0.228 0.234 0.148 0.212
1.00 0.049 0.053 0.079 0.067 0.069 0.030 0.054
size-adjusted power
0.60 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.65 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.70 0.999 0.998 1.000 1.000 1.000 0.999 0.997
0.75 0.992 0.977 0.994 0.998 0.999 0.996 0.996
0.80 0.937 0.893 0.945 0.959 0.956 0.950 0.954
0.85 0.779 0.691 0.773 0.812 0.805 0.796 0.805
0.90 0.494 0.413 0.467 0.496 0.481 0.483 0.490
0.95 0.214 0.186 0.181 0.195 0.193 0.196 0.199
1.00 0.050 0.050 0.050 0.050 0.050 0.050 0.050

Table 3.4b Power of various tests for fractional d, (demeaned data), T = 500

d tγ (d1∗ ) tγ (d̂LW ) tγ (d̂1∗ ) tη (d̂LW ) tη (d̂ELW ) LM0 tα

0.60 1.000 1.000 1.000 1.000 1.000 1.000 1.000

0.65 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.70 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.80 1.000 0.999 1.000 1.000 1.000 1.000 1.000
0.85 0.990 0.967 0.996 0.997 0.997 0.989 0.996
0.90 0.848 0.731 0.889 0.892 0.892 0.831 0.877
0.95 0.404 0.322 0.456 0.426 0.424 0.351 0.414
1.00 0.057 0.058 0.076 0.063 0.063 0.042 0.056
size-adjusted power
0.60 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.65 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.70 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.75 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.80 1.000 0.999 1.000 1.000 1.000 1.000 1.000
0.85 0.989 0.963 0.994 0.995 0.995 0.994 0.995
0.90 0.823 0.711 0.841 0.862 0.859 0.849 0.851
0.95 0.373 0.303 0.368 0.380 0.378 0.378 0.374
1.00 0.050 0.050 0.050 0.050 0.050 0.050 0.050

Note: d1∗ = 0.69145, d̂LW is the local Whittle estimator, d̂ELW is the exact Whittle estimator.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 133 — #58

134 Unit Root Tests in Time Series Volume 2

1
tη (dLW)
0.9 T = 200
0.8
tγ (dLW)
0.7

0.6
Power

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.5

0.4

0.3 tγ (d*1)

0.2

0.1

0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
d

Figure 3.10a Size-adjusted power (data demeaned)

0.5

0.45 T = 200

0.4

0.35

0.3
Power

tγ (dLW)
tγ (d*1)
0.25

0.2

0.15

0.1

0.05
0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1
d

Figure 3.10b Size-adjusted power (data demeaned), near unit root

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 134 — #59

Fractional Integration 135

distinguish across the range d ∈ (0.6, 1.0), although tγ (d̂1∗ ), tη (d̂LW ) and tη (d̂ELW )
are still slightly oversized.
The other case likely to arise in practice is that of a possible non-zero deter-
ministic trend, which is illustrated with detrended data. The LM tests now
also require detrending, which is achieved by basing the test on ε̂t = yt − β̂1 ,
where β̂1 is the sample mean of yt . The FDF-type tests are based on (Shimotsu)
detrended data obtained as yˆ t = y1 , where
yt − yt = yt − (β̂0 + β̂1 t); see Section
3.6, Shimtosu (2010) and Chapter 4. The results are summarised in Tables 3.5a

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
and 3.5b; and power and s.a power are shown in Figures 3.11a and 3.11b for
T = 200.
Starting with T = 200, all of the tests are oversized, but the LM0 test is only
marginally so at 5.5% followed by tγ (d̂LW ) at 6.6% (see the upper panel of Table
3.5a); the other tests have broadly twice their nominal size. The oversizing indi-
cates that the finite sample 5% critical value is to the left of the nominal critical
value. One way of overcoming the inaccuracy of the nominal critical values
is to bootstrap the test statistics (See UR, Vol. 1, chapter 8). An approximate
estimate of the finite sample 5% cv can be obtained by finding the normal dis-
tribution that delivers a 10% rejection rate with a critical value of −1.645; such
a distribution has a mean of −0.363 (rather than 0) and a (left-sided) 5% cv of
−2.0. Overall, in terms of s.a power, LM0 is the best of the tests although the dif-
ferences between the EFDF tests and the LM tests are relatively slight and tγ (d̂1∗ )
is also competitive (see Figure 3.11a); when the alternative is close to the unit
root, tγ (d1∗ ) and tγ (d̂LW ) also become competitive tests (see Figure 3.11b); whilst
away from the unit root tγ (d̂LW ) is least powerful. Thus, LM0 is recommended
in this case as it is close to its nominal size and the best overall in terms of (s.a)
power.
The general oversizing results for T = 200 suggest that it would be of interest to
increase the sample size for a better picture of what is happening to the empirical
size of the test, and Table 3.5b presents a summary of the results for T = 1,000.
In this case, tγ (d1 ), tγ (d̂LW ), and tα are now much closer to their nominal size
and LM0 maintains its relative fidelity; there are improvements in the empirical
size of tγ (d̂1∗ ), tη (d̂LW ) and tη (d̂ELW ), but these remain oversized. The s.a power
confirms that the LM tests still have a marginal advantage in terms of power,
whereas tγ (d̂LW ) is clearly the least powerful. Overall, the advantage does again
seem to lie with the LM type tests, LM0 and tα .

3.9 Example: US wheat production

The various tests for a fractional root are illustrated with a time series on US
wheat production; the data is in natural logarithms and is annual for the period
1866 to 2008, giving a sample of 143 observations. The data, yt , and the data
detrended by a regression on a constant and a linear trend, yt , are graphed

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 135 — #60

136 Unit Root Tests in Time Series Volume 2

Table 3.5a Power of various tests for fractional d, (detrended data), T = 200

d tγ (d1∗ ) tγ (d̂LW ) tγ (d̂1∗ ) tη (d̂LW ) tη (d̂ELW ) LM0 tα

0.60 1.000 1.000 1.000 1.000 1.000 1.000 1.000

0.65 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.70 1.000 0.998 1.000 1.000 1.000 0.999 1.000
0.75 0.994 0.982 0.999 0.999 0.999 0.994 0.999
0.80 0.948 0.907 0.981 0.980 0.980 0.934 0.972
0.85 0.806 0.715 0.889 0.877 0.877 0.756 0.842

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.90 0.541 0.449 0.650 0.630 0.630 0.453 0.580
0.95 0.270 0.219 0.345 0.311 0.311 0.187 0.274
1.00 0.082 0.066 0.124 0.108 0.115 0.055 0.093
size-adjusted power
0.60 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.65 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.70 0.999 0.998 1.000 1.000 1.000 0.999 1.000
0.75 0.977 0.974 0.990 0.993 0.991 0.992 0.993
0.80 0.899 0.877 0.928 0.930 0.929 0.925 0.927
0.85 0.698 0.655 0.735 0.735 0.733 0.740 0.742
0.90 0.422 0.381 0.430 0.425 0.424 0.437 0.427
0.95 0.176 0.167 0.170 0.165 0.166 0.178 0.174
1.00 0.050 0.050 0.050 0.050 0.050 0.050 0.050

Table 3.5b Power of various tests for fractional d, (detrended data), T = 1,000

d tγ (d1∗ ) tγ (d̂LW ) tγ (d̂1∗ ) tη (d̂LW ) tη (d̂ELW ) LM0 tα

0.825 1.000 1.000 1.000 1.000 1.000 1.000 1.000

0.850 1.000 0.999 1.000 1.000 1.000 1.000 1.000
0.875 1.000 0.991 1.000 1.000 1.000 1.000 1.000
0.900 0.986 0.942 0.999 0.991 0.991 0.989 0.992
0.925 0.909 0.774 0.931 0.932 0.935 0.901 0.923
0.950 0.660 0.509 0.707 0.707 0.698 0.632 0.681
0.975 0.300 0.209 0.336 0.337 0.299 0.244 0.274
1.00 0.059 0.045 0.079 0.079 0.069 0.044 0.059
size-adjusted power
0.825 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.850 1.000 0.999 1.000 1.000 1.000 1.000 1.000
0.875 1.000 0.992 1.000 1.000 1.000 1.000 1.000
0.900 0.985 0.949 0.982 0.989 0.989 0.990 0.989
0.925 0.890 0.788 0.900 0.911 0.912 0.911 0.910
0.950 0.635 0.525 0.624 0.647 0.649 0.657 0.659
0.975 0.267 0.222 0.250 0.258 0.258 0.256 0.256
1.00 0.050 0.050 0.050 0.050 0.050 0.050 0.050

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 136 — #61

Fractional Integration 137

0.9 tγ (d*1), tη(dLW),

T = 200 tη(dELW), LM0, tα
0.8

0.7 tγ (d*1)

0.6 tγ (dLW)

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Power

0.5

0.4

0.3

0.2

0.1

0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
d

Figure 3.11a Size-adjusted power (data detrended)

0.45

0.4 T = 200

0.35

0.3 tγ (dLW)
Power

0.25

0.2 LM0

0.15

0.1

0.05
0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1
d

Figure 3.11b Size-adjusted power (data detrended), near unit root

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 137 — #62

138 Unit Root Tests in Time Series Volume 2

(a)
15

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
12
1860 1880 1900 1920 1940 1960 1980 2000

Figure 3.12a US wheat production (logs)

(b)
0.5

–0.5

–1
1860 1880 1900 1920 1940 1960 1980 2000

Figure 3.12b US wheat production (logs, detrended)

in Figures 3.12a and 3.12b, respectively; the data is obviously trended, whilst
the detrended data shows evidence of a long cycle and persistence. A standard
approach would be to apply a unit root test to either yt or yt . To illustrate
we calculate the Shin and Fuller (1998) exact ML test for a unit root, denoted
LRUC,β and based on the unconditional likelihood function, applied to yt , and
the version of the ADF test appropriate to a detrended alternative, that is τ̂β ,
(see UR, Vol. 1, chapters 3 and 6).
ML estimation of an ARMA(p, q) model considering all lags to a maximum
of p = 3 and q = 3 led to an ARMA(1, 1) model selected by AIC and BIC. An
ADF(1) was selected based on a marginal t selection criterion using a 10% signif-
icance level. The estimation and test results are summarized in Table 3.6, with
‘t’ statistics in parentheses.
The results show that whilst the ML test does not reject the null hypothesis of a
unit root, the ADF test leads to the opposite conclusion and both are reasonably
robust in that respect to variations in the signiﬁcance level.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 138 — #63

Fractional Integration 139

Table 3.6 Estimation of models for

US wheat production

Exact ML estimation of ARMA(1, 1)

yt−1 +ε̂t − 0.326ε̂t−1
yt = 0.824
(10.04) (−2.20)
LRUC,β = 2.20, 5% cv = 2.80

ADF(1) estimation

yt = −0.317yt−1 −0.142yt−1

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(−4.72) (−1.71)
τ̂β = −4.52, 5% cv = −3.44

Note: (For critical values see UR Vol. 1,

Chapters 3 and 6.)

Table 3.7 Tests with a fractionally integrated alternative; p = 0 and p = 2 for augmented
tests

tγ (d1∗ ) tγ (d̂LW ) tγ (d̂1∗ ) tη (d̂LW ) tη (d̂ELW ) LM0 tα

p=0
test value −4.43 −4.14 −5.17 −4.55 −5.73 −3.19 −4.65

p=2
tγ (d1∗ ) tγ (d̂LW ) tγ (d̂ELW ) tη (d̂LW ) tη (d̂ELW ) LM0,0 tα
test value −2.54 −2.43 −3.23 −1.93 −3.19 −4.88 −3.00

Notes: d1∗ = 0.691 if no serial dependence allowed (p = 0); d1∗ = 0.901 for p = 2; throughout the
limiting 5% cv is −1.645, from the standard normal distribution; two-step estimation procedure used
for tη (d̂LW ) and tη (d̂ELW ) when p = 2 (see Section 3.5.7); similar results were obtained from nonlinear
estimation.

The tests in this chapter allow for the possibility that the alternative is a
fractionally integrated, but still nonstationary, process. The test results are sum-
marised in Table 3.7. The ML and ADF models suggest that the tests have to take
into account the possibility of serially dependent errors, but for comparison the
tests are first presented in their simple forms.
As a brief recap in the case of weakly dependent errors, the test statistics are
obtained as follows. In the case of the (LV version of the) FDF test, the suggested
augmented regression (see Section 3.8.3) is:
p
y̆t = γd1 y̆t−1 + ϕj yt−j + ut∗ (d1 ) (3.200)
j=1
p
This is referred to as the prewhitened ADF, PADF, where y̆t = yt − j=1 ϕ̂jyt−j , and
ϕ̂(L), a p-th order polynomial, is obtained from fitting an AR(p) model to yt ,
where yt is the detrended data. The test statistic is the t-type test associated with
γ, tγ (d1 ), and the optimal d1 depends on the order of ϕ(L) for local alternatives
(and the values of the ϕj coefficients for fixed alternatives); see Table 3.3. We find
p = 2 is sufficient to ‘whiten’ yt . Reference to Table 3.3 indicates that d1∗ = 0.901
for the choice of p = 2.

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 139 — #64

140 Unit Root Tests in Time Series Volume 2

The augmented EFDF test regression takes the following form:

yt = η[ϕ(L)zt−1 (d̂T )] + [1 − ϕ(L)]
yt + vt (3.201)

The test statistic is the t test associated with η, tη (d̂T ). This regression is non-
linear in the coefﬁcients because of the multiplicative form ηϕ(L), and the ϕ(L)
coefﬁcients also enter through the second set of terms. As noted above, there are
several ways to deal with this and the illustration uses the two stage procedure
(see Section 3.5.7); see Q3.11 for the nonlinear estimation method. For p = 2,

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
the ﬁrst stage comprises estimation of the AR(2) coefﬁcients obtained from:

d̂T
yt = ϕ̂1 d̂T
yt−1 + ϕ̂2 d̂T
yt−2 + ξ̂t

The regression in the second stage is

yt = ηwt−1 + ϕ1
yt−1 + ϕ2
yt−2 + v̂t

where wt−1 = (1 − ϕ̂1 L − ϕ̂2 L2 )zt−1 (d̂T ), with ϕ̂1 and ϕ̂2 obtained from the ﬁrst
stage.
The relevant version of the LM test is LM0,0 , given by:
√
T T−1 −1
LM0,0 = j ρ̂(j) ⇒D N(0, 1) under the null
ω̂0 j=1
T
t=j+1 ε̂t ε̂t−j
ρ̂(j) = T
t=1 ε̂t
2

ε̂t = θ̂(L)−1 ϕ̂(L)(1 − L)

In this case, exact ML estimation suggested an ARMA(1, 1) model, so that if a unit

root is imposed then this becomes an ARIMA(0, 1, 1) specification. Imposing
the unit root we find that θ̂(L) = (1 − 0.46L); the residuals can then be obtained
from the recursion ε̂t = yt + 0.46ε̂t−1 , t = 2, . . . , T. If the MA(1) coefficient is
zero, then this reduces to ε̂t = yt . To complete the test statistic an estimate ω̂0
of ω (under the null) is required; this is obtained by specializing (3.164) and
(3.165) to the first-order case (see also Q3.10).
The BH version of the LM test is constructed by first estimating an AR(p) model
in xt ≡ yt , saving the residuals ε̂t , then constructing the lagged weighted sum
of the ε̂t , denoted ε̂∗t−1 and running the (augmented) test regression as follows:
p
ε̂t = αε̂∗t−1 + ψi xt−i + υt
i=1

The various test results are illustrated with p = 2 and summarised in Table 3.7.
The results suggest that the unit root null should be rejected at usual signiﬁcance
levels; moreover the semi-parametric estimates of d, which are d̂LW = 0.765 and
d̂ELW = 0.594, with m = T0.65 , tend to conﬁrm this view. (Note that as d̂ELW is

10.1057/9781137003317 - Unit Root Tests in Time Series Volume 2, Kerry Patterson

PATTERSON: “9780230_250260_04_CHA03” — 2012/5/21 — 17:07 — PAGE 140 — #65

Fractional Integration 141

further from the unit root than d̂LW , the test statistics are more negative when
d̂ELW is used.) The simulation results reported in Table 3.5 suggested that LM0
maintained its size (at least at 5%), whereas the other tests had an actual size
about twice the nominal size, that is the ﬁnite sample critical value was to the
left of the nominal critical value. The test results are generally robust to likely
variations due to the oversizing found using critical values from the standard
normal distribution.
It is possible to set up different null hypotheses of interest; one in particular is

Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
H0 : d = 0.5 against HA : d ∈ (0.5, 1], which is a test of borderline nonstationarity,
and it is simple enough to apply the various tests to this situation, for details
see Tanaka (1999) and Breitung and Hassler (2002) for the LM tests and DGM
(2009) for the extension of the EFDF test.

3.10 Concluding remarks

This chapter has introduced some key concepts to extend the range of inte-
grated processes to include the case of fractional d. These are of interest in
themselves and for their use in developments of the I(d) literature to fraction-
ally cointegrated series. The ﬁrst question to be addressed is: can a meaning
be attached to the fractional difference operator applied to a series of obser-
vations? The answer is yes and relies upon an expansion of (1 − L)d using the
binomial theorem either directly or through a gamma function representation,
whilst the treatment of the initial condition gives rise to two types of fraction-
ally integrated processes. Corresponding developments have been made in the
mathematical analysis of fractionally integrated processes in continuous time;
see for example Oldham and Spanier (1974), Kalia (1993) and Miller and Ross
(1993).
Most of the analysis of this chapter was in the time domain, with the fre-
quency domain approach to be considered in the following chapter. Once
fractional d is allowed, the range of hypotheses of interest is extended quite nat-
urally. The set-up is no longer simply that of nonstationarity, d = 1, to be tested
against stationarity, d = 0; one obvious generalisation is d = 1 against d ∈ [0, 1),
but others may be of interest in context, for example d = 0 against d ∈ (0, 1]
or d = 0.5 (borderline nonstationary) against d ∈ (0.5, 1]. Processes with d > 0
have long memory in the sense that the autocorrelations are not summable,
declining hyperbolically for 0 < d < 1, whilst a process with d ≥ 0.5 is also
nonstationary.
One way to approach the testing problem is to extend the familiar frame-
work of Dickey and Fuller, as suggested by Dolado, Gonzalo and Mayoral (2002)
and further extended by Lobato and Velasco (2006, 2007) and Dolado, Gon-
zalo and Mayoral (2