Unit Root Tests in Time Series - Ebook
Unit Root Tests in Time Series - Ebook
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Titles include:
Simon P. Burke and John Hunter
MODELLING NON-STATIONARY TIME SERIES
Michael P. Clements
EVALUATING ECONOMETRIC FORECASTS OF ECONOMIC AND FINANCIAL VARIABLES
Lesley Godfrey
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
BOOTSTRAP TESTS FOR REGRESSION MODELS
Terence C. Mills
MODELLING TRENDS AND CYCLES IN ECONOMIC TIME SERIES
Kerry Patterson
A PRIMER FOR UNIT ROOT TESTING
Kerry Patterson
UNIT ROOT TESTS IN TIME SERIES VOLUME 1
Key Concepts and Problems
Kerry Patterson
UNIT ROOT TESTS IN TIME SERIES VOLUME 2
Extensions and Developments
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Kerry Patterson
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
may be liable to criminal prosecution and civil claims for damages.
The author has asserted his right to be identified as the author of this
work in accordance with the Copyright, Designs and Patents Act 1988.
First published 2012 by
PALGRAVE MACMILLAN
Palgrave Macmillan in the UK is an imprint of Macmillan Publishers Limited,
registered in England, company number 785998, of Houndmills, Basingstoke,
Hampshire RG21 6XS.
Palgrave Macmillan in the US is a division of St Martin’s Press LLC,
175 Fifth Avenue, New York, NY 10010.
Palgrave Macmillan is the global academic imprint of the above companies
and has companies and representatives throughout the world.
Palgrave® and Macmillan® are registered trademarks in the United States,
the United Kingdom, Europe and other countries
ISBN: 978–0–230–25026–0 hardback
ISBN: 978–0–230–25027–7 paperback
This book is printed on paper suitable for recycling and made from fully
managed and sustained forest sources. Logging, pulping and manufacturing
processes are expected to conform to the environmental regulations of the
country of origin.
A catalogue record for this book is available from the British Library.
A catalog record for this book is available from the Library of Congress.
10 9 8 7 6 5 4 3 2 1
21 20 19 18 17 16 15 14 13 12
Printed and bound in Great Britain by
CPI Antony Rowe, Chippenham and Eastbourne
Detailed Contents vi
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
List of Tables xx
Preface xxix
3 Fractional Integration 76
References 528
List of Tables xx
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
List of Figures xxiii
Preface xxix
vi
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2.3.2.i Serially correlated errors 49
2.3.2.ii Simulation results 51
2.4 The range unit root test 51
2.4.1 The range and new ‘records’ 52
2.4.1.i The forward range unit root test 53
2.4.1.ii Robustness of RUR F 54
2.4.2 The forward–backward range unit root test 55
2.4.3 Robustness of RUR F and R FB 56
UR
2.4.4 The range unit root tests for trending alternatives 59
2.5 Variance ratio tests 60
2.5.1 A basic variance ratio test 61
2.5.2 Breitung variance ratio test 64
2.6 Comparison and illustrations of the tests 66
2.6.1 Comparison of size and power 67
2.6.2 Linear or exponential random walks? 69
2.6.3 Empirical illustrations 70
2.6.3.i Ratio of gold–silver price (revisited) 70
2.6.3.ii Air revenue passenger miles (US) 71
2.7 Concluding remarks 73
Questions 74
3 Fractional Integration 76
Introduction 76
3.1 A fractionally integrated process 78
3.1.1 A unit root process with fractionally integrated noise 78
3.1.2 Binomial expansion of (1 − L)d 79
3.1.2.i AR coefficients 79
3.1.2.ii MA coefficients 80
3.1.2.iii The fractionally integrated model in terms
of the Gamma function 80
3.1.3 Two definitions of an I(d) process, d fractional 82
3.1.3.i Partial summation 82
3.1.3.ii Direct definition 83
3.1.4 The difference between type I and type II processes 84
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.2.1.ii Autocorrelations 88
3.2.1.iii Inverse autocorrelations 89
3.2.2 Graphical properties of some simple ARFIMA
models 89
3.3 What kind of models generate fractional d? 93
3.3.1 The error duration model (Parke, 1999) 93
3.3.1.i The model 93
3.3.1.ii Motivation: the survival rate of firms 94
3.3.1.iii Autocovariances and survival probabilities 94
3.3.1.iv The survival probabilities in a long-memory
process 96
3.3.1.v The survival probabilities in a short-memory,
AR(1), process 97
3.3.2 An example: the survival rate for US firms 97
3.3.3 Error duration and micropulses 99
3.3.4 Aggregation 99
3.3.4.i Aggregation of ‘micro’ relationships 100
3.3.4.ii The AR(1) coefficients are ‘draws’ from the
beta distribution 101
3.3.4.iii Qualifications 101
3.4 Dickey-Fuller tests when the process is fractionally
integrated 102
3.5 A fractional Dickey-Fuller (FDF) test for unit roots 105
3.5.1 FDF test for fractional integration 106
3.5.2 A feasible FDF test 107
3.5.3 Limiting null distributions 107
3.5.4 Serially correlated errors: an augmented FDF, AFDF 108
3.5.5 An efficient FDF test 110
3.5.6 EFDF: limiting null distribution 112
3.5.7 Serially correlated errors: an augmented EFDF, AEFDF 113
3.5.8 Limiting null distribution of tη (d̂T ) 114
3.6 FDF and EFDF tests: deterministic components 114
3.7 Locally best invariant (LBI) tests 118
3.7.1 An LM-type test 118
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.8 Power 127
3.8.1 Power against fixed alternatives 127
3.8.2 Power against local alternatives 128
3.8.3 The optimal (power-maximising) choice of d1
in the FDF test(s) 129
3.8.4 Illustrative simulation results 131
3.9 Example: US wheat production 135
3.10 Concluding remarks 141
Questions 142
Appendix 3.1 Factorial expansions for integer and non-integer d 149
A3.1 What is the meaning of (1 − L)d for fractional d? 149
A3.2 Applying the binomial expansion to the fractional difference
operator 150
A3.3 The MA coefficients in terms of the gamma function 151
Appendix 3.2 FDF test: assuming known d1 152
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
4.3.1.vi Testing the unit root hypothesis using the
GPH estimator 171
4.4 Pooling and tapering 171
4.4.1 Pooling adjacent frequencies 172
4.4.2 Tapering 172
4.5 Variants of GPH estimation using pooling and tapering 174
4.5.1 Pooled and pooled and tapered GPH 174
4.5.2 A bias-adjusted log-periodogram estimator 175
4.5.2.i The AG bias-reduced estimator 175
4.5.2.i.a The estimator 175
4.5.2.i.b Asymptotic properties of the AG
estimator 178
4.5.2.i.c Bias, rmse and mopt comparisons;
comparisons of asymptotics 179
4.6 Finite sample considerations and feasible estimation 183
4.6.1 Finite sample considerations 185
4.6.2 Feasible estimation (iteration and ‘plug-in’ methods) 186
4.7 A modified LP estimator 188
4.7.1 The modified discrete Fourier transform and
log-periodogram 189
4.7.2 The modified DFT and modified log-periodogram 190
4.7.3 The modified estimator 191
4.7.4 Properties of the modified estimator 191
4.7.4.i Consistency 191
4.7.4.ii Asymptotic normality for d ∈ [0.5, 2] if the
initial condition is zero or known 192
4.7.4.iii Reported simulation results 192
4.7.5 Additional simulations 193
4.7.6 Testing the unit root hypothesis 197
4.8 The approximate Whittle contrast and the Gaussian
semi-parametric estimator 200
4.8.1 A discrete contrast function (Whittle’s contrast function) 200
4.8.2 The Whittle contrast as a discrete approximation
to the likelihood function 201
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
4.8.6 The presence of a trend 206
4.8.7 Consistency and the asymptotic distribution of d̂LW 207
4.8.8 More on the LW estimator for d in the nonstationary
range 207
4.9 Modifications and extensions of the local Whittle estimator 209
4.9.1 The modified local Whittle estimator 209
4.9.2 The ‘exact’ local Whittle estimator 210
4.9.2.i A problem 210
4.9.2.ii The exact LW estimator, with known initial
value: d̂ELW 211
4.9.2.iii Properties of the exact LWE 213
4.9.2.iv The exact LW estimator with unknown
initial value: d̂FELW,μ 214
4.9.2.v Properties of d̂FELW,μ 215
4.9.2.vi Allowing for a deterministic polynomial
trend 215
4.10 Some simulation results for LW estimators 216
4.11 Broadband – or global – estimation methods 221
4.12 Illustrations 223
4.12.1 The US three-month T-bill 224
4.12.2 The price of gold in US$ 228
4.13 Concluding remarks 231
Questions 232
Appendix: The Taylor series expansion of a logarithmic function 238
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
white noise case 254
5.2.10.iii ESTAR(2) 254
5.2.11 Existence and stability of stationary points 255
5.2.11.i Equilibrium of the STAR process 255
5.2.11.ii Stability of singular points and limit cycles 256
5.2.11.iii Some special cases of interest 258
5.3 LSTAR transition function 259
5.3.1 Standardisation of the transition variable in the
LSTAR model 260
5.3.2 The LSTAR model 261
5.4 Further developments of STAR models: multi-regime models 264
5.5 Bounded random walk (BRW) 265
5.5.1 The basic model 265
5.5.2 The bounded random walk interval 266
5.5.3 Simulation of the BRW 269
5.6 Testing for nonstationarity when the alternative is a
nonlinear stationary process 269
5.6.1 The alternative is an ESTAR model 269
5.6.1.i The KSS test for a unit root 269
5.6.1.ii A joint test 271
5.6.1.iii An Inf-t test 271
5.6.1.iv Size and power of unit root tests against
ESTAR nonlinearity 274
5.6.1.v Size 274
5.6.1.vi Power 274
5.6.2 Testing for nonstationarity when the alternative is a
nonlinear BRW 281
5.7 Misspecification tests to detect nonlinearity 283
5.7.1 Teräsvirta’s tests 283
5.7.2 The Escribano-Jordá variation 286
5.8 An alternative test for nonlinearity: a conditional mean
encompassing test 288
5.8.1 Linear, nonnested models 288
5.8.2 Nonlinear, nonnested models: ESTAR and LSTAR 290
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
5.10.2 Direct nonlinear estimation 302
5.10.3 Simulation set-up 303
5.10.3.i Setting starting values for direct estimation 303
5.10.3.ii Grid search 303
5.10.3.iii The error variance and admissibility rules 304
5.10.3.iv The set-up and simulation results 304
5.11 Estimation of the BRW 308
5.11.1 Simulation set-up 309
5.11.2 Simulation results 310
5.12 Concluding remarks 314
Questions 316
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
6.3.3 Regime-dependent heteroscedasticity 343
6.4 Testing for nonstationarity in TAR and MTAR models 344
6.4.1 Enders-Granger model, 2RTAR and 2RMTAR 344
6.4.1.i Known threshold 345
6.4.1.ii Unknown threshold 348
6.4.2 Testing for nonstationarity in a 3RTAR 348
6.4.2.i Testing for two unit roots conditional on a
unit root in the inner regime 348
6.4.2.ii Testing for three unit roots 351
6.5 Test for a threshold effect in a stationary TAR model 352
6.5.1 A stationary 2RTAR 353
6.5.2 Test for a threshold effect in a stationary multi-regime
TAR, with unknown regime separation 355
6.5.3 A test statistic for a null hypothesis on κ 356
6.6 Tests for threshold effects when stationarity is not
assumed (MTAR) 356
6.6.1 Known regime separation 357
6.6.2 Unknown regime separation 357
6.7 Testing for a unit root when it is not known whether
there is a threshold effect 360
6.8 An empirical study using different nonlinear models 361
6.8.1 An ESTAR model for the dollar–sterling exchange rate 361
6.8.1.i Initial tests 361
6.8.1.ii Estimation 363
6.8.2 A bounded random walk model for the dollar–sterling
exchange rate 367
6.8.2.i Estimation results 367
6.8.2.ii An estimated random walk interval (RWI) 368
6.8.3 Two- and three-regime TARs for the exchange rate 369
6.8.4 A threshold effect? 372
6.8.5 The unit root null hypothesis 373
6.9 Concluding remarks 374
Questions 376
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
7.2.2 Innovation outlier (IO) 389
7.2.2.i IO Model 1 389
7.2.2.ii IO Model 2 391
7.2.2.iii IO Model 3 392
7.3 Development: higher-order dynamics 393
7.3.1 Stationary processes (the alternative hypothesis) 393
7.3.1.i Model 1 set-up 393
7.3.1.ii Model 2 set-up 394
7.3.1.iii Model 3 set-up 395
7.3.2 Higher-order dynamics: AO models 395
7.3.2.i Summary of models: AO case 396
7.3.3 Higher-order dynamics: IO models 397
7.3.3.i Summary of models: IO case 399
7.3.4 Higher-order dynamics: distributed outlier
(DO) models 400
7.3.4.i Model 1 400
7.3.4.ii Model 2 402
7.3.4.iii Model 3 403
7.3.4.iv Summary of models: DO case 403
7.4 AO or IO? 404
7.4.1 DO to AO 404
7.4.2 DO to IO 405
7.4.3 Distributions of test statistics 405
7.4.4 Model selection by log likelihood/information criteria 408
7.5 Example AO(1), IO(1) and DO(1) 410
7.5.1 Estimation 410
7.5.2 Bootstrapping the critical values 411
7.6 The null hypothesis of a unit root 412
7.6.1 A break under the null hypothesis 412
7.6.2 AO with unit root 412
7.6.2.i AO Model 1 413
7.6.2.ii AO Model 2 413
7.6.2.iii AO Model 3 413
7.6.3 IO with unit root 413
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
7.7.1 The AO models 416
7.7.2 IO Models 417
7.7.2.i IO Model 1 417
7.7.2.ii IO Model 2 418
7.7.2.iii IO Model 3 419
7.7.3 DO Models 420
7.7.3.i DO Model 1 420
7.7.3.ii DO Model 2 420
7.7.3.iii DO Model 3 421
7.8 Critical values 422
7.8.1 Exogenous versus endogenous break 422
7.8.2 Critical values 422
7.9 Implications of a structural break for DF tests 423
7.9.1 Spurious non-rejection of the null hypothesis of a
unit root 423
7.9.2 The set-up 423
7.9.3 Key results 424
7.9.3.i Data generated by Model 1 425
7.9.3.ii Data generated by Model 1a 426
7.9.3.iii Data generated by Model 2 427
7.9.3.iv Data generated by Model 3 428
7.10 Concluding remarks 429
Questions 431
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
8.2.1.ii The decision rule 448
8.2.1.iii Critical values 448
8.2.2 Signficance of the coefficient(s) on the dummy
variable(s) 448
8.2.2.i The ‘VP criteria’ 448
8.2.2.ii Critical values 450
8.2.3 A break under the null hypothesis 451
8.2.3.i Invariance of standard test statistics 451
8.3 Further developments on break date selection 453
8.3.1 Lee and Strazicich (2001): allowing a break under
the null 454
8.3.1.i Simulation results 454
8.3.1.ii Use BIC to select the break date? 456
8.3.2 Harvey, Leybourne and Newbold (2001) 457
8.3.2.i The break date is incorrectly chosen 457
8.3.3 Selection of break model under the alternative 457
8.3.3.i Misspecifying the break model: what are the
consequences? 458
8.3.3.ii No break under the null 459
8.3.3.iii Break under the null 460
8.3.3.iv Simulation results 460
8.3.3.v Implications of choosing the wrong model 463
8.4 What kind of break characterises the data? 463
8.4.1 Methods of selecting the type of break 464
8.5 Multiple breaks 469
8.5.1 AO Models 470
8.5.2 IO Models 472
8.5.3 Grid search over possible break dates 473
8.5.3.i Selecting the break dates 473
8.5.3.ii Test statistics 473
8.6 Illustration: US GNP 474
8.6.1 A structural break? 475
8.6.2 Which break model? 478
8.6.3 The break date? 479
8.6.3.i
Break dates suggested by different criteria 479
8.6.3.ii
AO Model 2 and IO Models 1 and 2:
estimation results with BIC date of break 480
8.6.3.iii The estimated long-run trend functions 480
8.6.3.iii.a AO Model 2 482
8.6.3.iii.b IO Model 1 482
8.6.3.iii.c IO Model 2 482
8.6.4 Inference on the unit root null hypothesis 484
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
8.6.4.i AO Model 2 484
8.6.4.ii IO Models 484
8.6.4.ii.a IO Model 1 484
8.6.4.ii.b IO Model 2 487
8.6.4.ii.c A break under null 487
8.6.5 Overall 488
8.6.6 Two breaks 489
8.6.6.i IO(2, 3) model 489
8.7 Concluding remarks 492
Questions 493
References 528
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Subject Index 546
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
5% nominal level) 33
2.2 Critical values for KM test statistics 38
2.2a Critical values for V1 and V2 38
2.2b Critical values for U1 and U2 38
2.3 Sensitivity of critical values for V1 , V2 ; T = 100, ψ1 = 0.0, ψ1 =
0.8 39
2.4 Power of KM nonnested tests for 5% nominal size 40
2.5 Regression details: gold–silver price ratio 42
2.6 Regression details: world oil production 44
2.7 Size of δ̂r,μ and τ̂r,μ using quantiles for δ̂μ and τ̂μ 46
2.8 Critical values of the rank-based DF tests δ̂r,μ , τ̂r,μ , δ̂r,β and τ̂r,β 47
2.9 Quantiles for the BG rank-score tests 50
2.10 Critical values of the forward range unit root test, RUR F (α, T) 54
2.11 Critical values of the empirical distribution of RUR (α, T)
FB 56
2.12 Actual and nominal size of DF and range unit root tests when
εt ∼ t(3) and niid(0, 1) quantiles are used 57
2.13 Critical values of νrat,j 66
2.14 Summary of size and power for rank, range and variance
ratio tests 68
2.15 Levels or logs, additional tests: score and variance ratio tests
(empirical size) 70
2.16 Tests for a unit root: ratio of gold to silver prices 71
2.17 Regression details: air passenger miles, US 72
2.18 Tests for a unit root: air passenger miles 72
2.19 KM tests for linearity or log-linearity: air passenger miles 73
3.1 Empirical survival rates Sk and conditional survival rates
Sk /Sk−1 for US firms 97
3.2 Power of ADF τ̂μ test for fractionally integrated series 105
3.3 Optimal d1 , d1∗ , for use in PAFDF t̂γ (d1 ) tests (for local
alternatives) 131
3.4a Power of various tests for fractional d, (demeaned data), T = 200 133
3.4b Power of various tests for fractional d, (demeaned data), T = 500 133
3.5a Power of various tests for fractional d, (detrended data), T = 200 136
3.5b Power of various tests for fractional d, (detrended data), T = 1,000 136
3.6 Estimation of models for US wheat production 139
3.7 Tests with a fractionally integrated alternative; p = 0 and p = 2
for augmented tests 139
xx
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
plug-in version of the AG estimator, d̂PAG (r); d = 0, T = 512 188
4.5 Simulation results for d̂GPH and d̂MGPH with AR(1) short-run
dynamics, φ1 = 0.3, T = 512 193
4.6 Properties of d̂LW for stationary and nonstationary cases 208
4.7 LW estimators considered in the simulations 216
4.8 Semi-parametric estimators considered in this section 223
4.9 Estimates of d for the US three-month T-bill rate 225
4.10 Estimates of d for the price of gold 229
opt opt
A4.1 mGPH and RMSE at mGPH for d̂AG (r), r = 0, 1 234
A4.2 Ratio of asymptotic bias for d̂AG (0) and d̂AG (1),
φ1 = 0.9 235
5.1 Critical values of the KSS test for nonlinearity tNL 271
5.2 Critical values of the Inf-t test for nonlinearity 274
5.3 Empirical size of unit root tests for ESTAR nonlinearity: 5%
nominal size 275
5.4 Power of some unit root tests against an ESTAR alternative 275
5.5 Power of τ̂μ and τ̂ws
μ against the BRW alternative for different
values of α(i) and σε2 282
5.6 Nonlinearity testing: summary of null and alternative hypotheses 286
5.7 Summary of illustrative tests for nonlinearity 288
5.8 Estimation of the BRW model 311
5.9 Quantiles of the LR test statistic and quantiles of χ2 (1) 314
6.1 Integers for the ergodicity condition in a TAR model 330
6.2 Quantiles for testing for a unit root in a 2RTAR model and in a
2RMTAR model 346
6.3 Simulated quantiles of W(0, 0) and exp[W(0, 0)/2] 351
6.4 Unit root test statistics for the US: UK real exchange rate 362
6.5a Tests for nonlinearity (delay parameter = 1) 363
6.5b Tests for nonlinearity (delay parameter = 2) 363
6.6 Estimation of ESTAR models for the dollar–sterling real
exchange rate 365
6.7 Estimation of a BRW model for the dollar–sterling real
exchange rate 368
6.8 Estimation of the (linear) ADF(2) model for the dollar–sterling
real exchange rate 370
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
7.1 Asymptotic critical values for AO and IO models 1, 2 and 3,
with λb = λcb 423
8.1 Asymptotic 95% quantile for Wald test statistics for a break in
the trend 445
8.2 Model 2 asymptotic and finite sample 5% critical values for
Sup-W(λb , γ) for the joint null hypothesis of a break in the
trend function and unit root 446
(i)
8.3 Asymptotic critical values for τ̃γ (λb ), AO and IO models 1, 2
and 3, with λb selected by different criteria 449
(i)
8.4 5% critical values for the τ̃γ (λb ) test, i = Model 1, Model 1a;
break date = Tb = T̃b + 1 458
8.5 Summary of Wald-type diagnostic break tests for one break 478
8.6 BIC for the AO and IO versions of Models 1, 2, and 3 479
8.7 Break dates suggested by various criteria in Models 1, 2 and 3 480
8.8 Estimation of AO and IO Models (single-break model) 482
8.9 Estimation of two breaks, IO (2, 3) Model 483
8.10 Bootstrap critical values for two-break IO (2, 3) model 492
9.1 Quantiles of test statistics: Z1 , Z2 , E1 and E2 505
9.2 Size of τ̂-type tests in the presence of GARCH(1, 1) errors from
different distributions; T = 200, nominal 5% size 510
9.3 Power of τ̂-type tests in the presence of GARCH(1, 1) errors;
ρ = 0.95, 0.9, T = 200 and ut ∼ N(0, 1), nominal 5% size 511
9.4 Critical values for τ̂rma
μ and τ̂rma
β 516
9.5 Size of various τ̂-type tests in the presence of GARCH(1, 1)
errors; T = 200, nominal 5% size 517
9.6 Power of τ̂-type tests in the presence of GARCH (1, 1) errors;
ρ = 0.95, 0.9, T = 200 and ut ∼ N(0, 1), nominal 5% size 518
9.7 Estimation details: ADF(3) for the US savings ratio, with and
without GARCH(1, 1) errors 524
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
1.3 13
1.4 CDFs of F, Sup-C and Cave tests 14
1.5 CDFs of F, Sup-C and Cave tests for TAR 17
1.6 Hog–corn price ratio (log, monthly data) 19
2.1 Gold–silver price ratio 42
2.2 World oil production 43
2.3 QQ plots of DF τ̂ tests, niid v. t(3) 57
2.4 QQ plots of range unit-root tests, niid v. t(3) 58
2.5 US air passenger miles 71
3.1 Autocorrelations for FI(d) processes 90
3.2 MA coefficients for FI(d) processes 90
3.3 AR coefficients for FI(d) processes 91
3.4 Simulated data for fractional d 92
3.5 Simulated data for fractional d and serial correlation 92
3.6 Survival and conditional survival probabilities 96
3.7 Survival rates of US firms: actual and fitted 98
3.8 Power of DF tests against an FI(d) process 103
3.9 The drift functions for tγ , tη and LM0 130
3.10a Size-adjusted power (data demeaned) 134
3.10b Size-adjusted power (data demeaned), near unit root 134
3.11a Size-adjusted power (data detrended) 137
3.11b Size-adjusted power (data detrended), near unit root 137
3.12a US wheat production (logs) 138
3.12b US wheat production (logs, detrended) 138
4.1 The spectral density of yt for an AR(1) process 158
4.2 Optimal m for GPH estimator 170
4.3a Asymptotic bias, d̂(AG) (r), T = 500 180
4.3b Asymptotic bias, d̂(AG) (r), T = 2,500 180
4.4a Asymptotic rmse, d̂(AG) (r), T = 500 181
4.4b Asymptotic rmse, d̂(AG) (r), T = 2,500 181
4.5 Bias of selected GPH and MGPH variants, zero initial value 195
4.6 Mse of selected GPH and MGPH variants, zero initial value 195
4.7 Bias of selected GPH and MGPH variants, non-zero initial value 196
4.8 Mse of selected GPH and MGPH variants, non-zero initial value 196
4.9 Bias of selected GPH and MGPH variants, trended data, β1 = 1 197
xxiii
4.10 Mse of selected GPH and MGPH variants, trended data, β1 = 1 198
4.11a Bias of selected GPH and MGPH variants, trended data, β1 = 10 199
4.11b Mse of selected GPH and MGPH variants, trended data, β1 = 10 199
4.12 Mean of best of LW estimators, zero initial value 217
4.13 Mse of best of LW estimators, zero initial value 218
4.14 Mean of best of LW estimators, non-zero initial value 219
4.15 Mse of best of LW estimators, non-zero initial value 220
4.16 Mean of best of LW estimators, trended data 220
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
4.17 Mse of best of LW estimators, trended data 221
4.18 US T-bill rate 224
4.19a Estimates of d as bandwidth varies, US T-bill rate 227
4.19b d̂ELW,μ as bandwidth varies, US T-bill rate 227
4.20 Gold price US$ (end month) 1974m1 to 2004m12 228
4.21a Estimates of d as bandwidth varies, price of gold 230
4.21b d̂ELW,μ as bandwidth varies, price of gold 230
5.1 ESTAR transition function 244
5.2a ESTAR and random walk, θ = 0.02 245
5.2b ESTAR transition function, θ = 0.02 245
5.3a ESTAR and random walk, θ = 2.0 246
5.3b ESTAR transition function, θ = 2.0 246
5.4a AESTAR and random walk, θ+ = 0.02, θ− = 2.0 247
5.4b AESTAR transition function, θ+ = 0.02, θ− = 2.0 247
5.5a AESTAR and random walk, γ1 + = −1.0, γ1 − = −0.1 248
5.5b AESTAR transition function, γ1 + = −1.0, γ1 − = −0.1 248
5.6a ESTAR variable ar coefficient, βt , θ = 0.02, γ1 = −1.0 249
5.6b ESTAR variable ar coefficient, βt , θ = 2.0, γ1 = −1.0 249
5.7a ESTAR variable ar coefficient, βt , θ+ = 0.02, θ− = 2.0, γ1 = −1.0 250
5.7b ESTAR variable ar coefficient, βt , γ1+ = −1.0, γ1− = −0.1, θ = 0.02 250
5.8 ESTAR(2) simulation, multiple equilibria 257
5.9 ESTAR(2) simulation, stable limit cycle 258
5.10 LSTAR transition function 261
5.11a LSTAR and random walk, ψ = 0.75 262
5.11b LSTAR transition function, ψ = 0.75 262
5.12a LSTAR and random walk, ψ = 6.0 263
5.12b LSTAR transition function, ψ = 6.0 263
5.13a Bounded random walk interval, α1 = 3, α2 = 3 267
5.13b Bounded and unbounded random walks, α1 = 3, α2 = 3 267
5.14a Bounded random walk interval, α1 = 5, α2 = 5 268
5.14b Bounded and unbounded random walks, α1 = 5, α2 = 5 268
5.15 Inf-t statistics for a unit root 273
5.16a ESTAR power functions, γ1 = −1, σε = 1 276
5.16b ESTAR power functions, γ1 = −0.1, σε = 1 276
5.17a ESTAR power functions, γ1 = −1, σε = 0.4 277
5.17b ESTAR power functions, γ1 = −0.1, σε = 0.4 277
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
5.24 308
5.25 rmse of estimates of φ1 as θ varies 309
5.26 Mean of estimates of γ1 as σε and θ vary 310
5.27 rmse of estimates of γ1 as σε and θ vary 311
5.28 Distribution of bias of estimates of α0 and α1 , symmetry imposed 312
5.29 Distribution of bias of estimates of α0 and α1 , symmetry not
imposed 312
5.30 QQ plots of NLS estimates of bias of α1 313
5.31 Distribution of the LR test statistic for symmetry 315
5.32 QQ plot of LR test statistic for symmetry against χ2 (1) 315
A5.1 Plot of quantiles for STAR PSE test statistics 324
6.1 Dollar–sterling real exchange rate (logs) 362
6.2 ESTAR(2): Residual sum of squares 364
6.3 Two-scale figure of real $: £ rate and β̂t 367
7.1 F test of AO restriction 407
7.2 F test of IO restriction 407
7.3 Selection by AIC/BIC (relative to DO) 408
7.4 Selection by max-LL, AO and IO models 409
7.5 Power of DF test: Model 1 level break 426
7.6 Power of DF test: Model 1a level break 427
7.7 Asymptotic n-bias: Models 2 and 3 428
7.8 Power of DF test: Model 2 slope break 429
7.9 Power of DF test: Model 3 slope break 429
8.1 Model 1 correct 466
8.2 Model 2 correct 467
8.3 Model 3 correct 468
8.4 Model 2 correct: ρ = 0, ρ = 0.9 470
8.5 US GNP (log and growth rate) 475
8.6a Sup-W test, null: no break in intercept 476
8.6b Sup-W test, null: no break in intercept + slope 476
8.6c Sup-W test, null: no break in slope 477
8.6d ZM Sup-W test: unit root and no breaks 477
8.7a AO Model 2: break date varies 481
8.7b IO Model 1: break date varies 481
8.8a US GNP, split trend, AO Model 2, break = 1945 485
8.8b US GNP, split trend, AO Model 2, break = 1938 485
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
9.3 Simulated time series with GARCH(1, 1) errors 508
9.4 US personal savings ratio (%) 523
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
standard DF t-type test statistic using detrended data
δ̂ ≡ T(ρ̂ − 1), where ρ̂ is the least squares (LS) estimator
δ̂μ as in the specification of τ̂μ
δ̂β as in the specification of τ̂β
⇒D convergence in distribution (weak convergence)
→p convergence in probability
→ tends to, for example ε tends to zero, ε → 0
→ mapping
⇒ implies
∼ is distributed as (or, from the context, left and right hand sides
approach equality)
≡ definitional equality
= not equals
≈ approximately
(z) the cumulative distribution function of the standard normal
distribution
the set of real numbers; the real line (–∞ to ∞)
+ the positive half of the real line
N+ the set of nonnegative integers
εt white noise unless explicitly excepted
n
xj the product of xj , j = 1, …, n
j=1
n
j=1 xj the sum of xj , j = 1, …, n
0 + approach zero from above
−0 approach zero from below
L the lag operator, Lj yt ≡ yt−j
the first difference operator, ≡ (1 – L)
s the s-th difference operator, s ≡ (1 − Lj )
s the s-th multiple of the first difference operator, s ≡ (1 − L)s
⊂ a proper subset of
⊆ a subset of
∩ intersection of sets
∪ union of sets
∈ an element of
xxvii
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
This book is the second volume of Unit Root Tests in Time Series and is subtitled
‘Extensions and Developments’. The first volume was published in 2011 (Patter-
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
son, 2011), and subtitled ‘Key Concepts and Problems’ (referred to herein as UR,
Vol. 1). Additionally, a third contribution, although the first in the sequence,
entitled A Primer for Unit Root Testing (referred to herein as A Primer), was pub-
lished in 2010 (Patterson, 2010), and completes the set. The books can be read
independently depending on the reader’s background and interests.
I conceived this project around ten years ago, recognising a need to present
and critically assess the key developments in unit root testing, a topic that barely
twenty years before was only occasionally referenced in the econometrics liter-
ature, although its importance had been recognised long before, for example
in the modelling strategy associated with Box and Jenkins (1970). (See Mills
(2011), for an excellent overview in the context of the development of modern
time series analysis.)
An econometric or statistical procedure can be judged to be influential when
it reaches into undergraduate courses as well as becoming standard practice in
research papers and articles, and no serious empirical analysis of time series is
now presented without reporting seemingly obligatory unit root tests. However,
in the course of becoming standard practice many of the nuances and concerns
about the unit testing framework, carefully detailed in the original research, may
be overlooked, the job being done if the unit root ‘tick box’ has been checked.
Undergraduate dissertations and projects, as well as PhD theses, involving time
series, usually report Dickey-Fuller (DF) statistics or perhaps, depending on the
available software, tests in the form due to Elliott, Rothenberg and Stock (1996),
the ‘ERS’ tests. However, it can be the case that the analysis of the time series
properties of the data is somewhat superficial. Faced with the task of analysing
and modelling time series, I am struck by how careful one has to be and how
many possibilities there are that complicate the question of interpreting the out-
come of a unit root test. This is the starting point of this book that distinguishes
it from Volume 1.
When considering what to include here I took the view that the topics must
address the development of the standard unit root testing framework to situ-
ations that were likely to occur in practice. Moreover, whilst an overview of
as many problems as possible would have been one organising principle, this
approach would risk taking a ‘menu’-type solution to potential problems and,
instead, it would be preferable to address in detail some developments that have
xxix
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
simplicity.
Although much of the unit root framework has been developed in the econo-
metrics literature, the presence of key articles in general statistical journals (for
example, The Annals of Statistics, Biometrika, The Journal of the Royal Statistical
Society, The Journal of the American Statistical Association and so on) indicates
the wider interest and wider importance of the topic. Moreover, the tech-
niques have been finding interesting applications outside economics, showing
just how pervasive is the basic underlying problem. To give some illustrations,
Stern and Kaufman (1999) analysed a number of time series related to global
climate change, including temperatures for the northern and southern hemi-
spheres, carbon dioxide, nitrous oxide and sulphate aerosol emissions. Wang
et al. (2005) analysed the streamflow processes of 12 rivers in western Europe
for nonstationarity for the twentieth century, streamflow being an issue of con-
cern in the design of a flood protection system, a hydrological connection that
includes Hurst (1951) and Hosking (1984). Aspects of social behaviour have been
the subject of study, such as the nature of the gender gap in drink-driving (see
Schwartz and Rookey, 2008).
I noted in Volume 1 that the research and literature on this topic has grown
almost exponentially since Nelson and Plosser’s (N&P) seminal article published
in 1982. N&P applied the framework due to Dickey (1976) and Fuller (1976) to
testing for a unit root in a number of macroeconomic time series. Whilst the
problem of modelling nonstationary series was known well before 1982, and
was routinely taken into account in the Box-Jenkins methodology, the focus of
the N&P study was on the stochastic nature of the trend and its implications
for economic policy and the interpretation of ‘shocks’; this led to interest in the
properties of economic time series as worthy of study in itself.
I also noted in Volume 1 that articles on unit root tests are amongst the most
cited in economics and econometrics and have influenced the direction of eco-
nomic research at a much wider level. A citation summary for articles based
on univariate processes was included therein and showed that there has been a
sustained interest in the topic over the last thirty years. Out of interest I have
updated this summary and calculated the implied annual growth rate, with the
‘top 5’ on a citations basis presented in Table P1. The rate of growth is quite
astonishing and demonstrates the continuing interest in the topic of unit roots;
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Perron (1989) 3,371 3,932 11.5%
KPSS (1992) 3,280 3,996 15.0%
for example, there have been 1,300 more citations of the seminal Dickey and
Fuller (1979) article in less than 18 months.
As in the case of Volume 1, appropriate prerequisites for this book include
some knowledge of econometric theory at an intermediate or graduate level as,
for example, in Davidson (2000), Davidson and Mackinnon (2004) or Greene
(2011); also some exposure to the basic DF framework for testing for a unit
root would be helpful, but this is now included in most introductory courses in
econometrics (see, for example, Dougherty, 2011).
The results of a number of Monte Carlo studies are reported in various
chapters, reflecting my view that simulation is a key tool in providing guidance
on finite sample issues. Many more simulations were run than are reported in
the various chapters, with the results typically illustrated for one or two sample
sizes where they are representative of a wider range of sample sizes.
Chapter 1: Introduction
First, the opportunity is taken to offer a brief reminder of some key concepts
that underlie the implicit language of unit root tests. In addition there is also an
introduction to an econometric problem that occurs in several contexts in test-
ing for a unit root and is referenced in later chapters. The problem is that of the
presence of a ‘nuisance’ parameter which is only present under the alternative
hypothesis, sometimes referred to as the ‘Davies problem’. The problem and its
solution occur in several other areas of econometrics and it originates, at least
in its econometric interest, in application of the well-known Chow test (Chow,
1960) to the temporal stability of a regression model, but in the case that the
temporal point of instability is unknown.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
included in UR, Vol. 1. The tests in the latter were largely parametric in the
sense that they were concerned with estimation in the context of an assumed
parametric structure, usually in the form of an AR or ARMA model, the slight
exception being in estimating the long-run variance based on a semi-parametric
procedure. In this volume nonparametric tests are considered in greater detail,
where such tests use information in the data, such as ranks, signs and runs, that
do not require it to be structured by a model.
do not use all of the frequency range in estimating the long-memory parameter.
Research on these methods has been a considerable growth area in the last ten
years or so, leading to a potentially bewildering number of estimation methods.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
not observed. There are a number of models that allow unit root-type behaviour,
but take into account either the limits set by the data range or the economic
mechanisms that are likely to prevent unbounded random walks in the data.
These models have in common that they involve some form of nonlinearity.
The models considered in this chapter involve a smooth form of nonlinearity,
including smooth transition autoregressions based on the familiar AR model,
allowing the AR coefficients to change as a function of the deviation of an
‘activating’ or ‘exciting’ variable from its target or threshold value. A second
form of nonlinear model is the bounded random walk, referred to as a BRW,
which allows random walk behaviour to be contained by bounds, or buffers,
that ‘bounce’ the variable back if it is getting out of control.
The idea that regime change, rather than a unit root, was the cause of nonsta-
tionarity led to a fundamental re-evaluation of the simplicity of the dichotomy
of ‘opposing’ the mechanisms of a unit root process on the one hand and a
trend stationary process on the other. Examples of events momentous enough
to be considered as structural breaks include the Great Depression (or Crash),
the Second World War, the 1973 OPEC oil crisis and more recently ‘9/11’, the
financial crisis (the ‘credit crunch’) of 2008 and the government debt problem
in the Euro area. Under the alternative hypothesis of stationarity, trended time
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
series affected by such events could be modelled as stationary around a broken
or split trend. Chapter 7 is primarily concerned with the development of an
appropriate modelling framework for assessing the impact of a possible trend or
mean break at a known break date, which is necessary to understand the more
realistic cases dealt with in Chapter 8.
and time again Hanselman and Littlefield (2005) was an essential and highly
recommended companion to the use of MATLAB.
I would like to thank my publishers Palgrave Macmillan, and Taiba Batool
and Ellie Shillito in particular, for their continued support not only for this
book but in commissioning the series of which it is a part, namely ‘Palgrave
Texts in Econometrics’, and more recently extending the concept in a practical
way to a new series, ‘Palgrave Advanced Texts in Econometrics’, inaugurated by
publication of The Foundations of Time Series Analysis (Mills, 2011). These two
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
series have, at the time of writing, led to eight books, with more in press and,
combined with the Handbook of Econometrics published in two volumes (Mills
and Patterson, 2006, 2009), have made an important contribution to scholarly
work and educational facilities in econometrics. Moreover, as noted above, the
increasing ubiquity of modern econometric techniques means that the ‘reach’
of these contributions extends into non-economic areas, such as climatology,
geography, meteorology, sociology, political science and hydrology.
In the earlier stages of preparation of the manuscript and figures for this book
(as in the preparation of other books) I had the advantage of the willing and able
help of Lorna Eames, my secretary in the School of Economics at the University
of Reading, to whom I am grateful. Lorna Eames retired in April 2011. Those who
have struggled with the many tasks involved in the preparation of a mansuscript,
particularly one of some length, will know the importance of having a reliable
and willing aide; Lorna was exceptionally that person.
If you have comments on any aspects of the book, please contact me at my
email address given below.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Introduction
The first four sections of this chapter review some basic concepts that, in part,
serve to establish a common notation and define some underlying concepts for
later chapters. In particular Section 1.1 considers the nature of a stochastic pro-
cess, which is the conceptualisation underlying the generation of time series
data. Section 1.2 considers stationarity in its strict and weak forms. As in Vol-
ume 1, the parametric models fitted to univariate time series are often variants
of ARIMA models, so these are briefly reviewed in Section 1.3 and the concept of
the long-run variance, which is often a key component of unit root tests where
there is weak dependency in the errors, is outlined in Section 1.4. Section 1.5 is
more substantive and considers, by means of some simple illustrations, a prob-
lem and its solution that arises in Chapters 5, 6 and 8. The problem is that of
designing a test statistic when there is a parameter that is not identified under
the null hypothesis. The two cases considered here are variants of the Chow test
(Chow, 1960), which tests for stability in a regression model. This section also
considers how to devise a routine to obtain the bootstrap distribution of a test
statistic and the bootstrap p-value of a sample test statistic.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
example is given below.
A discrete-time stochastic process, ε(t), with T ⊆ N + may be summarised as:
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ε(t). If T = 3, then the sample space of yt , y , is simply related to ε , specifically
y = {(1)(2)(3), (1)(2)(1), (1)(0)(1), (1)(0)(–1), (–1)(–2)(–3)), (–1)(–2)(1), (–1)(0)
(–1), (–1)(0)(1)}, and each ordered component of the sequence defines a sample
path. The resulting stochastic process is a binomial random walk and in this case,
as it has been assumed that the coin is fair, the random walk is symmetric. Whilst
the original stochastic process for εt is stationary, that for yt is not; the definition
of stationarity is considered in the next section. A simple generalisation of this
partial sum process is to generate the εt as draws from a normally distributed
random variable, giving rise to a Gaussian random walk.
Stochastic processes arise in many different applications. For example, pro-
cesses involving discrete non-negative numbers of arrivals or events are often
modelled as a Poisson process, an illustrative example being the number of
shoppers at a supermarket checkout in a given interval (for an illustration see
A Primer, chapter 3). A base model for returns on financial assets is that they
follow a random walk, possibly generated by a ‘fat-tailed’ distribution with a
higher probability of extreme events compared to a random walk driven by
Gaussian inputs. To illustrate, a partial sum process was defined as in the case of
the binomial random walk with the exception that the ‘inputs’ εt were drawn
from a Cauchy distribution (for which neither the mean nor the second moment
exist), so that there was a greater chance of ‘outliers’ compared to draws from
a normal distribution. The fat tails allow the simulations to mimic crisis events
such as the events following 9/11 and the 2008 credit crunch. Figure 1.1 shows
4 of the resulting sample paths over t = 1, . . . , 200. The impact of the fat tails
is evident and, as large positive and large negative shocks can and do occur, the
sample paths (or trajectories) show some substantial dips and climbs over the
sample period.
400
300
200
100
yt
0
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
–100
–200
–300
0 20 40 60 80 100 120 140 160 180 200
t
where P(.) is the joint probability mass function (pmf) for the random variables
enclosed in (.). Strict stationarity requires that the joint pmf for the sequence of
random variables of length T starting at time τ + 1 is the same for any shift in the
time index from τ to s and for any choice of T. This means that it does not matter
which T-length portion of the sequence is observed, each has been generated by
the same unchanging probability structure. A special case of this result in the
discrete case is that for T = 1 where P(yτ ) = P(ys ), so that the marginal pmfs must
also be the same for τ = s implying that E(yτ ) = E(ys ). These results imply that
other moments, including joint moments such as the covariances, are invariant
to arbitrary time shifts.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
E(yτ ) = E(ys ) = µy ⇒ the mean of the process is constant
var(yτ ) = var(ys ) = σy2 ⇒ the variance of the process is constant
cov(yτ , yτ+k ) = cov(ys , ys+k ) ⇒ the autocovariances are invariant to
translation for all k
Ross (2003) gives examples of processes that are weakly stationary but not strictly
stationary; also, a process could be strictly stationary, but not weakly stationary
by virtue of the non-existence of its moments. For example, a random process
where the components have unchanging marginal and joint Cauchy distribu-
tions will be strictly stationary, but not weakly stationary because the moments
do not exist.
The leading case of nonstationarity, at least in econometric terms, is that
induced by a unit root in the AR polynomial of an ARMA model for yt , which
implies that the variance of yt is not constant over time and that the k-th order
autocovariance of yt depends on t. A process that is trend stationary is one that is
stationary after removal of the trend. A case that occurs in this context is where
the process generates data as yt = β0 + β1 t + εt , where εt is a zero mean weakly
stationary process. This is not stationary of order one because E(yt ) = β0 + β1 t,
which is not invariant to t. However, if the trend component is removed either
as E(yt − β1 t) = β0 or E(yt − β0 − β1 t) = 0, the result is a stationary process.
Returning to the partial sum process yt = tj=1 εj , with εt a binomial random
variable, note that E(εt ) = 0, E(ε2t ) = σε2 = 1 and E(εt εs ) = 0 for t = s. However, in
the case of yt , whilst E(yt ) = tj=1 E(εj ) = 0, the variance of yt is not constant as
t varies and E(y2t ) = tσε2 = t. Moreover, the autocovariances are neither equal to
zero nor invariant to a translation in time; for example, consider the first-order
autocovariances cov(y1 , y2 ) and cov(y2 , y3 ), which are one period apart, then:
The general result is that cov(yt , yt+1 ) = tσε2 , so the partial sum process generat-
ing yt is not covariance stationary. The assumption that εt a binomial random
variable is not material to this argument, which generalises to any process with
E(ε2t ) = σε2 and E(εt εs ) = 0, such as εt ∼ N(0, σε2 ).
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
in the AR component and q in the MA component for the univariate process
generating yt , assuming that E(yt ) = 0, is written as follows:
For economy of notation it is conventional to use, for example, φ(1) rather than
φ(L = 1). By taking the AR terms to the right-hand side, the ARMA(p, q) model
is written as:
The term µt has the interpretation of a trend function, the simplest and most
frequently occurring cases being where yt has a constant mean, so that µt = µ,
and yt has a linear trend, so that µt = β0 + β1 t.
The ARMA model can then be written in deviations form by first defining
ỹt ≡ yt − µt , with the interpretation that ỹt is the detrended (or demeaned) data,
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
With µ̂t replacing µt , (1.9) becomes:
where, for consistency with (1.8), µ∗t = φ(L)µt . For example, in the constant
mean case and linear trend cases µ∗t is given, respectively, by:
µ∗t = µ∗
= φ(1)µ (1.12a)
µ∗t = φ(L)(β0 + β1 t) (1.12b)
= φ(1)β0 + β1 φ(L)t
= β∗0 + β∗1 t
p p
where β∗0 = φ(1)β0 + β1 j=1 jφj , β∗1 = φ(1)β1 and φ(1) = 1 − j=1 φj .
An equivalent representation of the ARMA(p, q) model is to factor out the
dominant root, so that φ(L) = (1 − ρL)ϕ(L), where ϕ(L) is invertible, and the
resulting model is specified as follows:
(1 − ρL)yt = zt (1.13a)
ϕ(L)zt = θ(L)εt (1.13b)
zt = ϕ(L)θ(L)εt (1.13c)
This error dynamics approach (see UR, Vol. 1, chapter 3) has the advantage of
isolating all dynamics apart from that which might be associated with a unit
root (ρ = 1 in this case) into the error zt .
An important concept in unit root tests is the long-run variance. It is one of three
variances that can be defined when yt is determined dynamically, depending on
the extent of conditioning in the variance. This is best illustrated with a simple
example and then generalised. Consider an AR(1) model:
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
account:
σy2 = var(ρyt−1 + εt )
yt = ρyt−1 + εt
= (1 − ρL)−1 εt
∞
= ρi εt−i moving average form (1.16)
i=0
∞
2
σy,lr = var ρi εt−i
i=0
∞
2i 2
= ρ σε because cov(εt−i εt−j ) = 0 for i = j
i=0
= w(L)εt (1.18)
lag polynomial w(L) is the causal linear (MA) filter governing the response of
{yt } to {εt }.
In an ARMA model, the MA polynomial is w(L) = ∞ −1
j=0 wj L = φ(L) θ(L), with
j
ω0 = 1; for this representation to exist the roots of φ(L) must lie outside the unit
circle, so that φ(L)−1 is defined. The long-run variance of yt , σy,lr
2 , is then just
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
1.5 The problem of a nuisance parameter only identified under
the alternative hypothesis
This section considers the general nature of a problem that occurs in different
developments of models with unit roots, the solution to which has applications
in later chapters, particularly Chapters 5, 6 and 8. The problem of interest is
sometimes referred to as the Davies problem, after Davies (1977, 1987), who
considered hypothesis testing when there is a nuisance parameter that is only
present under the alternative hypothesis. An archetypal case of such a problem
is where the null model includes a linear trend, which is hypothesised to change
at some particular point in the sample. If that breakpoint was known, obtain-
ing a test statistic would just require the application of standard principles, for
example an F test based on fitting the model under the alternative hypothesis.
However, in practice, the breakpoint is unknown and one possibility is to search
over all possible breakpoints and calculate the test statistic for each possibility;
however, the result of this enumeration over all possibilities is that, the F test
for example, no longer has an F distribution.
To illustrate the problem and its solution, we consider a straightforward
problem of testing for structural stability in a bivariate regression with a non-
stochastic regressor and then in the context of a simple AR model. At this stage
the exposition does not emphasise the particular problems that arise if a unit
root is present (either under the null or the alternative). The aim here is to set
out a simple framework that is easily modified in later chapters.
discrete events, often institutional in nature, for example a change in the tax
regime or in political institutions.
In the second case there are again two regimes, but they are separated by an
endogenous indicator function, rather than the exogenous passage of time. The
typical case here is where the data sequence {yt }Tt=1 is generated by an AR(p)
model under H0 , whereas under HA the coefficients of the AR(p) model change
depending on a function of lagged yt relative to a threshold κ. In both models the
nature of the change is quite limited. According to HA the basic structure of the
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
model is not changed, but the coefficients are allowed to change. The common
technical problem of interest here is that typically the structural breakpoint or
threshold is not known, whereas standard tests assume that these are known.
Consider the standard problem of applying the Chow test for a structural
break. A simple bivariate model, with a nonstochastic regressor, is sufficient to
illustrate the problem.
yt = α0 + α1 xt + εt t = 1, . . . , T (1.19)
where εt ∼ iid(0, σε2 ), that is {εt }Tt=1 is a sequence of independent and identically
distributed ‘shocks’ with zero mean and constant variance, σε2 ; and, along the
lines of an introductory approach, assume that xt is fixed in repeated samples.
The problem is to assess whether the regression model has been ‘structurally
stable’ in the sense that the regression coefficients α0 and α1 have been constant
over the period t = 1, . . . T, where for simplicity σε2 is assumed a constant. There
are many possible schemes for inconstancy, but one that is particularly simple
and has found favour is to split the overall period of T observations by intro-
ducing a breakpoint (in time) Tb , such that there are T1 observations in period 1
followed by T2 observations in period 2, thus T = T1 + T2 and t = 1, . . . , T1 , T1 +
1, . . . , T; note that Tb = T1 . Such a structural break is associated with a discrete
change related to external events.
According to this scheme the regression model may be written as:
where I(t > T1 ) = 1 if the condition in (.) is true and 0 otherwise. There will be
several uses of this indicator function in later chapters. The null hypothesis of
no change is H0 : δ0 = δ1 = 0 and the alternative hypothesis is that one or both
of the regression coefficients has changed HA : δ0 = 0 and/or δ1 = 0.
A regression for which H0 is not rejected is said to be stable. A standard test
statistic, assuming that, as we do here, there are sufficient observations in each
regime to enable estimation, is an F test, which will be distributed as F(2, T – 4)
if we add the assumption that εt is normally distributed (so that it is niid) or T is
sufficiently large to enable the central limit theorem to deliver normality. The
F test statistic, first in general, is:
ε̂ ε̂r − ε̂ur ε̂ur T − K
C= r ∼ F(g, T − K) (1.21)
ε̂ur ε̂ur g
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ε̂ur ε̂ur 2
where ε̂r and ε̂ur are the vectors of residuals from fitting (1.19) and (1.20a),
respectively. At the moment the assumption that xt is fixed in repeated samples
removes complications due to possible stochastic regressors.
The F test is one form of the Chow test (Chow, 1960) and obviously generalises
quite simply to more regressors. A large sample version of the Chow test uses the
result that if a random variable, C, has the F(g, T – K) distribution, then gC ⇒D
χ2 (g), where ⇒D indicates weak convergence or convergence in distribution
(see A Primer, chapter 4). The test is sometimes presented in Wald form as:
ε̂r ε̂r − ε̂ur ε̂ur
W=T (1.23)
ε̂ur ε̂ur
T
=g C
T−k
≈ gC
W ⇒D χ2 (g) (1.24)
4.5
3.5
Simulated quantile
2.5
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
F quantile
each set of simulations and, for simplicity λb = 0.5 is used for this illustration.
Figure 1.2 shows a quantile-quantile plot taking the first quantile from F(2, T – 4)
and the matching quantile from the simulated distribution of C, based on R =
50,000 replications; as a reference point, Figure 1.2 also shows as the solid line
the 45°line resulting from pairing the quantiles of F(2, T – 4) against themselves.
The conformity of the simulated and theoretical distributions is evident from
the figure.
In the second case, λb is still taken as given for a particular set of R replications,
but λb is then varied to generate further sets of R replications; in this case λb
∈ = [0.10, 90] so that there is 10% trimming and the range of is divided into
a grid of G = 81 equally spaced points (so that each value of T = 100 is considered
as a possible break point Tb ). This enables an assessment of the variation, if any,
in the simulated distributions as a function of λb (but still taking λb as given for
each set of simulations). The 95% quantiles for each of the resulting distributions
are shown in Figure 1.3, from which it is evident that although there are some
varaiations these are practically invariant even for λb close to the beginning and
end of the sample; (the maximum number of observations in each regime is at
the centre of the sample period, so that the number of observations in one of
the regimes declines as λb → 0.1 or λb → 0.9).
As noted, the more likely case in practice is that a structural break is suspected
but the timing of it is uncertain, so that λb is unknown. The questions then
arise as to what might be taken as a single test statistic in such a situation and
3.2
95% quantile for F(2, 96)
3.1
2.9
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2.8
2.7
2.6
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
λb
what is the distribution of the chosen test statistic. One possibility is to look for
the greatest evidence against the null hypothesis of structural stability in the
set , so that the chosen test statistic is the supremum of C, say Sup-C, over all
possible breakpoints, with the breakpoint being estimated as that value of λb ,
say λ̂ = λmax
b
, that is associated with Sup-C. This idea is associated with the work
of Andrews (1993) and Andrews and Ploberger (1994), who also suggested the
arithmetic average and the exponential average of the C values over the set .
For reference these are as follows:
Note that Andrews presents these test statistics in their Wald forms referring
to, for example, Sup-W(.). The F form is used here as it is generally the more
familiar form in which the Chow test is used; the translation to the Wald form
was given in (1.23). Note that neither C(λb )ave nor C(λb )exp provide an estimate
of the breakpoint.
Andrews (1993) obtained the asymptotic null distributions of Supremum
versions of the Wald, LR and LM Chow tests and shows them to be standard-
ised tied-down Bessel processes. Hansen (1997b) provides a method to obtain
approximate asymptotic p-values for the Sup, Exp and Ave tests. Diebold and
Chen (1996) consider structural change in an AR(1) model and suggested using
a bootstrap procedure which allows for the dynamic nature of the regression
0.9
F(2, 96)
0.8
Sup-C(λb)
C(λb)ave
0.7
0.6
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.5
0.4
0.3
0.2
90% quantiles shown
0.1
1.85 2.36 5.08
0
0 1 2 3 4 5 6 7 8 9
model and the practicality of a finite sample, for which asymptotic critical values
are likely to be a poor guide.
Note that the distribution of Sup-C(λ̂) is not the same as the distribution of
C(λb ), where the latter is C evaluated at λb , so that quantiles of C(λb ) should
not be used for hypothesis testing where a search procedure has been used to
estimate λb . The effect of the search procedure is that the quantiles of Sup-C(λ̂)
are everywhere to the right of the quantiles of the corresponding F distribution.
On the other hand the averaging procedure in C(λb )ave tends to reduce the
quantiles that are relevant for testing.
To illustrate, the CDFs for F(2, 96), Sup-C(λ̂) and C(λb )ave are shown in Figure
1.4 for the procedure that searches over λb ∈ = [0.10, 90], with G = 81 equally
spaced points, for each simulation. For example, the 90% and 95% quantiles of
the distributions of Sup-C(λ̂) are 5.08 and 6.02, and those of C(λb )ave are 1.85
and 2.28, compared to 2.36 and 3.09 from F(2, 96).
in yt . If the threshold is determined in this way such models are termed self-
exciting, as the movement from one regime to another is generated within the
model’s own dynamics. This gives rise to the acronym SETAR for self-exciting
threshold autoregression if the threshold depends on lagged yt , and to MSETAR
(sometimes shortened to MTAR) for momentum SETAR if the threshold depends
on yt .
The following illustrates a TAR of order one, TAR(1), which is ‘self-exciting’:
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
yt = ϕ0 + ϕ1 yt−1 + εt if f(yt−1 ) ≤ κ, regime 1 (1.28)
yt = φ0 + φ1 yt−1 + εt if f(yt−1 ) > κ, regime 2 (1.29)
where εt ∼ iid(0, σε2 ), thus, for simplicity, the error variance is assumed con-
stant across the two regimes; for expositional purposes we assume that f(yt−1 )
is simply yt−1 , so that the regimes are determined with reference to a single
lagged value of yt . It is also assumed that |ϕ1 | < 1 and |φ1 | < 1, so that there
are no complications arising from unit roots (the unit root case is considered in
chapter 6). Given this stability condition, the implied long-run value of yt , y∗ ,
depends upon which regime is generating the data, being y∗1 = ϕ0 /(1 − ϕ1 ) in
Regime 1 and y∗2 = φ0 /(1 − φ1 ) in Regime 2.
The problem is to assess whether the regression model has been ‘structurally
stable’, with stability defined as H0 : ϕ0 = φ0 and ϕ1 = φ1 , whereas HA : ϕ0 = φ0
and/or ϕ1 = φ1 is ‘instability’ in this context. As in the Chow case, the two
regimes can be written as one generating model:
under HA . The problems of estimating κ and, hence, the regime split, and testing
for the existence of one regime against two regimes, are solved together and take
the same form as the in the Chow test.
In respect of the first problem, specify a grid of N possible values for κ, say κ(i) ∈
K and then estimate (1.30) for each κ(i) , taking as the estimate of κ the value of
κ(i) that results in a minimum for the residual sum of squares over all possible
values of κ(i) . As in the Chow test, some consideration must be given to the range
of the grid search. In a SETAR, κ is expected to be in the observed range of yt , but
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
this criterion must be coupled with the need to enable sufficient observations in
each regime to empirically identify the TAR parameters. Thus, starting from the
ordered values of yt , say y(1) < y(2) < . . . < y(T) , some observations are trimmed
from the beginning and end of the sequence {y(t) }T1 , with, typically, 10% or 15%
of the observations trimmed out of the sample to define K.
Obtaining the minimum of the residual sum of squares from the grid search
also leads to the Supremum F test: the residual sum of squares under H0 is given
for all values of κ(i) , hence the F-test will be maximised where the residual sum
of squares is minimised under HA . The resulting estimator of κ is denoted κ̂.
The average and exponential average test statistics are defined as in (1.26) and
(1.27), respectively. The test statistics are referred to as Sup-C(κ̂), C(κ(i) )ave and
C(κ(i) )exp to reflect their dependence on κ̂ and κ(i) ∈ K, respectively.
In the case of unknown κ (as well as known κ), the finite sample quantiles can
be obtained by simulation or by a natural extension to a bootstrap procedure.
To illustrate, the quantiles of the test statistics with unknown κ were simulated
as in the case for Figure 1.4, but with data generated as follows: y0 is a draw from
N(0, 1/(1 − ϕ12 )), that is a draw from the unconditional distribution of yt , which
assumes stationarity (and normality); subsequent observations were generated
by yt = ϕ1 yt−1 + εt , εt ∼ N(0, 1), with ϕ1 = 0.95 and T = 100. The Chow-type
tests allowed for a search over κ(i) ∈ K = [y(0.1T) ,y(0.9T) ], i = 1, . . . , 81. The CDFs
for F(2, 96), Sup-C(λ̂) and C(λb )ave are shown in Figure 1.5, from which it is
evident that the CDFs are very similar to those in the corresponding Chow case
(see Figure 1.4). The 90% and 95% quantiles of the simulated distributions of
Sup-C(λ̂) are 5.15 and 6.0, and those of C(λb )ave are 1.83 and 2.25, compared
to 2.36 and 3.09 from F(2, 96).
0.9
F(2, 96)
0.8
Sup-C(λb)
C(λb)ave
0.7
0.6
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.5
0.4
0.3
0.2
90% quantiles shown
0.1
1.83 2.36 5.15
0
0 1 2 3 4 5 6 7 8 9
simulation the process shocks, εt , are drawn from a specified distribution, usually
N(0, σε2 ), and normally it is possible to set σε2 = 1 without loss. Thus, if the
empirical distribution function is not close to the normal distribution function,
differences are likely to arise in the distribution of the test statistic of interest.
The bootstrap steps are as follows.
Calculate the required test statistics, for example Sup-C(.), C(.)ave and C(.)exp .
3. Generate the bootstrap data as:
The bootstrap innovations ε̂bt are drawn with replacement from the residuals
{ε̂t }Tt=1 . The initial value in the bootstrap sequence of ybt is yb0 , and there are
(at least) two possibilities for this start up observation. One is to choose yb0
at random from {yt }Tt=1 , another, the option taken here, is to choose yb0 = y1 ,
so that the bootstrap is ‘initialised’ by an actual value.
4. Estimate the restricted (null) and unrestricted (alternative) regressions using
bootstrap data over the grid, either λb ∈ or κ(i) ∈ K:
ŷbt = ϕ̂0b + ϕ̂1b yt−1 + I(.)γ̂0b + I(.)γ̂1b ybt−1 , ebt = ybt − ŷbt
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Calculate the bootstrap versions of the required test statistics, say Sup-Cb (.),
Cb (.)ave and Cb (.)exp . The test statistics may be computed in Wald or F form.
It is usually easier to calculate the test statistics in the form of the difference
in the restricted and unrestricted residual sums of squares; alternatively they
can be calculated using just the unrestricted regression, see Q1.2, in which
b,r
case there is no need to calculate ŷt .
5. Repeat steps 3 and 4 a total of B times, giving B sets of bootstrap data and B
values of the test statistics.
6. From step 5, sort each of the B test statistics and thus obtain the bootstrap
(cumulative) distribution.
7. The quantiles of the bootstrap distribution are now available from step 6;
additionally the bootstrap p-value, pbs , of a sample test statistic can be esti-
mated by finding its position in the corresponding bootstrap distribution. For
example, pbs [Sup-C(.)] = #[Sup-Cb (.) > Sup-C(.)]/B, where # is the counting
function (B + 1 is sometimes used in the denominator, but B should at least
be large enough for this not to make a material difference).
This bootstrap procedure assumes that there is not a unit root under the null
hypothesis. If this is not the case, then the bootstrap data should be generated
with a unit root (see Chapters 6 and 8 and UR, Vol. 1, chapter 8).
3.5
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2.5
1.5
1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
where ŷ∗ is the estimated steady state. Two variations are considered, the first
where possible structural instability is related to an unknown breakpoint and
the other to the regime indicator yt−1 ≤ κ, where κ is unknown, so that the
alternative is a two-regime TAR.
The models estimated for each regime, together with their implied steady states,
are as follows.
Regime 1
Regime 2
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
squares is T̂1 = 561, so that λ̂b = 0.473 and this estimate is well inside . The
Chow tests allowing for the search procedure are: Sup-C(λ̂b ) = 8.42, C(λb )ave =
4.21 and C(λb )exp = 2.69. The simulated 95% quantiles (assuming an exogenous
regressor) for T = 1,187 are 5.75, 2.22 and 1.37, respectively, which suggest
rejection of the null hypothesis of no temporal structural break. There are
two variations on obtaining appropriate critical values. First, by way of con-
tinuing the illustration, the quantiles should allow for the stochastic regressor
yt−1 , so a second simulation was undertaken using an AR(1) generating process
yt = ϕ1 yt−1 + εt , with ϕ1 = 0.95 and y0 ∼ N(0, 1/(1 − ϕ12 )), εt ∼ N(0, 1). In this
case, the 95% quantiles were 7.26, 2.49 and 1.55, each slightly larger than in
the case with a nonstochastic regressor. Lastly, the quantiles and p-values were
obtained by bootstrapping the test statistic to allow for the possibility that the
empirical distribution function is generated from a dependent or non-normal
process. The bootstrap 95% quantiles, with B = 1,000, for Sup-C(λ̂b ), C(λb )ave
and C(λb )exp were 6.08, 2.17 and 1.36, with bootstrap p-values, pbs , of 1%, 0.5%
and 1%, respectively.
Two-regime TAR
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
an issue to investigate in a more substantive analysis. (Fitting an AR(2) model,
though preferably to try and pick up some of the cyclical behaviour, resulted in
very similar estimates of κ̂ and ŷ∗ , and does not add to the illustration.)
The Chow-type tests allowing for a search over κ(i) ∈ K were: Sup-C(κ̂) = 7.80,
C(κ(i) )ave = 3.01 and C(κ(i) )exp = 2.07. Two variations were considered to obtain
the quantiles of the null distribution. As in the case of the Chow test for temporal
stability, the generating process was taken to be AR(1), yt = ϕ1 yt−1 + εt , with
ϕ1 = 0.95 and y0 ∼ N(0, 1/(1 − ϕ12 )), εt ∼ N(0, 1). In this case, the 95% quantiles
for Sup-C(κ̂), C(κ(i) )ave and C(κ(i) )exp were 6.37, 2.2 and 1.35, so H0 is rejected.
The choice of ϕ1 = 0.95 was motivated by the LS estimate of ϕ1 , which is fairly
close to the unit root; but having started in this way a natural extension is
to bootstrap the quantiles and p-values. The bootstrap 95% quantiles for Sup-
C(λ̂b ), C(λb )ave and C(λb )exp were 6.82, 2.3 and 1.41, with bootstrap p-values,
pbs , of 2.2%, 0.2% and 0.6%, respectively.
A chapter summary was provided in the preface, so the intention here is not
to repeat that but to focus briefly on the implications of model choice for fore-
casting, which implicitly encapsulates inherent differences, such as the degree
of persistence, between different classes of models. Stock and Watson (1999)
compared forecasts from linear and nonlinear models for 215 monthly macroe-
conomic time series for horizons of one, six and twelve months. The models
considered included AR models in levels and first differences and LSTAR (logis-
tic smooth transition autoregressive) models (see Chapter 5). They used a form
of the ERS unit root test as a pretest (see Elliott, Rothenberg and Stock, 1996,
and UR, Vol. 1, chapter 7). One of their conclusions was that “forecasts at all
horizons are improved by unit root pretests. Severe forecast errors are made in
nonlinear models in levels and linear models with time trends, and these errors
are reduced substantially by choosing a differences or levels specification based
on a preliminary test for a unit root” (Stock and Watson, 1999, p. 30). On the
issue of the importance of stationarity and nonstationarity to forecasting see
Hendry and Clements (1998, 1999).
The study by Stock and Watson was extensive, with consideration of many
practical issues such as lag length selection, choice of deterministic components
and forecast combination, but necessarily it had to impose some limitations. The
class of forecasting models could be extended further in several directions. A unit
root pretest suggests the simple dichotomy of modelling in levels vs differences,
whereas a fractional unit root approach, as in Chapters 3 and 4, considers a con-
tinuum of possibilities; and Smith and Yadav (1994) show there are forecasting
costs to first differencing if the series is generated by a fractionally integrated
process. Second, even if taking account of a unit root seems appropriate there
remains the question of whether the data should first be transformed, for exam-
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ple in logarithms, so that another possibility is a pretest that includes the choice
of transformation, which itself could affect the outcome of a unit root test, (see
Chapter 2).
The LSTAR models considered by Stock and Watson were one in a class of
nonlinear models that allow smooth transition (ST) between AR regimes, others
include: the ESTAR (exponential STAR) and BRW (bounded random walk), con-
sidered in Chapter 5; piecewise linear models either in the form of threshold
models, usually autoregressive, TAR, as in Chapter 6; or temporally contiguous
models that allow for structural breaks ordered by time, with known breakpoint
or unknown breakpoint, as considered in Chapters 7 and 8, respectively.
Questions
Q1.1. Rewrite the TAR(1) model of Equation (1.30) so that the base model is
Regime 1:
A1.1. Note that indicator function is written so that it takes the value 1 if yt−1 >
κ, whereas in Equation (1.30) it was written as I(yt−1 ≤ κ). Recall that γ0 = ϕ0 −φ0
and γ1 = ϕ1 − φ1 , hence making the substitutions then, as before:
Regime 1
yt = ϕ0 + ϕ1 yt−1 + εt
Regime 2
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
that is, approximately in finite samples W is g times the corresponding F-version
of the test.
A1.2. First rewrite the TAR(1) in an appropriate matrix-vector form:
where:
The restricted model imposes (δ0 , δ1 ) = (0, 0). The restrictions can be expressed
in the form Rβ = s, where
β1
0 0 1 0 β2
0
=
0 0 0 1 β3 0
β4
Although this could be reduced to
1 0 β3 0
=
0 1 β4 0
∂L/ = −2Z y + 2Z Zβ − 2R λ = 0
∂β
∂L/ = −2(Rβ − s) = 0
∂λ
Let br and λr denote the solutions to these first order conditions, then:
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
br = (Z Z)−1 Z y + (Z Z)−1 R λr
= b + (Z Z)−1 R λr
The restricted residual vector and restricted residual sum of squares are then
obtained as follows:
A A = [Z(Z Z)−1 R [R(Z Z)−1 R ]−1 (Rb − s)] [Z(Z Z)−1 R [R(Z Z)−1 R ]−1 (Rb − s)]
= (Rb − s) [R(Z Z)−1 R ]−1 R(Z Z)−1 Z Z(Z Z)−1 R [R(Z Z)−1 R ]−1 (Rb − s)
= (Rb − s) [R(Z Z)−1 R ]−1 R(Z Z)−1 R [R(Z Z)−1 R ]−1 (Rb − s)
= (Rb − s) [R(Z Z)−1 R ]−1 (Rb − s)
Hence:
W = (Rb − s) [σ̃ε,ur2
R(Z Z)−1 R ]−1 (Rb − s)
ε̂ ε̂r − ε̂ur ε̂ur
=T r
ε̂ur ε̂ur
T ε̂r ε̂r − ε̂ur ε̂ur (T − k)
=g
(T − k) ε̂ur ε̂ur g
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
T ε̂ ε̂r − ε̂ur ε̂ur (T − k)
=g C, C= r
(T − k) ε̂ur ε̂ur g
≈ gC
E(S2T ) sums all the elements in the covariance matrix of y, of which there are T
of the form E(y2t ), 2(T – 1) of the form E(yt yt−1 ), 2(T – 2) of the form E(yt yt−2 )
and so on until 2E(y1 yT ), which is the last in the sequence. If the {y2t } sequence
is covariance stationary, then E(y2t ) = γ0 and E(yt yt−k ) = γk , hence:
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
= Tγ0 + 2 (T − j)γj
j=1
T−1
T−1 E(S2T ) = γ0 + 2T−1 (T − j)γj
j=1
T−1
= γ0 + 2 (1 − j T)γj
j=1
2
σy,lr ≡ limT→∞ T−1 E(S2T )
∞
= γ0 + 2 γj
j=1
In taking the limit it is legitimate to take j as fixed and let the ratio j/T tend to
∞
zero. Covariance stationarity implies γk = γ−k , hence γ0 + 2 ∞ j=1 γj = j=−∞ γj
and, therefore:
∞
2
σy,lr = γj
j=−∞
A1.3.ii. First back substitute to express yt in terms of current and lagged εt and
then obtain the variance and covariances:
∞
yt = ρi εt−i
i=0
γ0 ≡ var(yt )
∞
= ρ2i E(ε2t−i ) using cov(εt εs ) = 0 for t = s
i=0
= ρk (1 − ρ2 )−1 σε2
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
= (1 − ρ)−2 σε2
Next note that yt = ρyt−1 + εt may be written as yt = w(L)εt where w(L) =
(1 − ρL)−1 , hence:
var[w(1)εt ] = w(1)2 σε2
= (1 − ρ)−2 σε2 as w(1)2 = (1 − ρ)−2
2
= σy,lr
A1.3.iii. The sdf for a (covariance stationary) process generating data as yt =
w(L)εt is:
fy (λj ) = |w(e−iλj )|2 fε (λj ) λj ∈ [−π, π]
where w(e−iλj ) is w(.) evaluated at e−iλj , and fε (λj ) = (2π)−1 σε2 , for all λj , is the
spectral density function of the white noise input εt . Thus, the sdf of yt is the
power transfer function of the filter, |w(e−iλj )|2 , multiplied by the sdf of white
noise. If λj = 0, then:
fy (0) = |w(e0 )|2 fε (0)
= (2π)−1 w(1)2 σε2
= (2π)−1 σy,lr
2
The sdf may also be defined directly in terms of either the Fourier transform
of the autocorrelation function, as in Fuller (1996, chapter 3), or of the auto-
covariance function, as in Brockwell and Davis (2006, chapter 4), the resulting
definition differing only by a term in γ(0). We shall take the latter representation
so that
k=∞
fy (λj ) = (2π)−1 γ(k)e−ikλj
k=−∞
Hence if λj = 0, then:
k=∞
fy (0) = (2π)−1 γ(k)
k=−∞
= (2π)−1 σy,lr
2
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Introduction
The chapter starts with an issue that is often not explicitly addressed in empir-
ical work. It is usually not known whether a variable of interest, yt , should be
modelled as it is observed, referred to as the levels (or ‘raw’ data), or as some
transformation of yt , for example by taking the logarithm or reciprocal of the
variable. Regression-based tests are sensitive to this distinction, for example
modelling in levels when the random walk is in the logarithms, implying an
exponential random walk, will affect the size and power of the test.
The second topic of this chapter is to consider nonparametric or semi-
parametric tests for a unit root, which are in contrast to the parametric tests
that have been considered so far. The DF tests, and their variants, are parametric
tests in the sense that they are concerned with direct estimation of a regression
model, with the test statistic based on the coefficient on the lagged dependent
variable. In the DF framework, or its variants (for example the Shin and Fuller,
1998, ML approach), the parametric structure is that of an AR or ARMA model.
Nonparametric tests use less structure in that no explicit parametric framework
is required and inference is based on other information in the data, such as
ranks, signs and runs. Semi-parametric tests use some structure, but it falls short
of a complete parametric setting; an example here is the rank score-based test
outlined in Section 2.3, which is based on ranks, but requires an estimator of
the long-run variance to neutralise it against non-iid errors.
This chapter progresses as follows. The question of what happens when a
DF-type test is applied to the wrong (monotonic) transformation is elaborated
in Section 2.1. If the unit root is in the log of yt , but the test is formulated
in terms of yt , there is considerable ‘over-rejection’ of the null hypothesis, for
example, an 80% rejection rate for a test at a nominal size of 5%, which might
be regarded as incorrect. However, an alternative view is that this is a correct
decision, and indeed 100% rejection would be desirable as the unit root is not
28
present in yt (in the terminology of Section 2.2, the generating process is a log-
linear integrated process not a linear integrated process). There are two possible
responses to the question of whether if a unit root is present it is in the levels or
the logs. A parametric test, due to Kobayashi and McAleer (1999) is outlined in
Section 2.2; this procedure involves two tests reflecting the non-nested nature
of the alternative specifications. This combines a test in which the linear model
is estimated and tested for departures in the direction of log-linearity and then
the roles are reversed, resulting in two test statistics. An alternative approach
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
is to use a test for a unit root that is invariant to a monotonic transformation,
such as the log transformation.
A test based on the ranks is necessarily invariant to monotonic transforma-
tions. Hence, a more general question can be posed: is there a unit root in the
process generating yt or some monotonic transformation of yt ? Thus, getting
the transformation ‘wrong’, for example using levels when the data should be in
logs, will not affect the outcome of such a test. Equally you don’t get something
for nothing, so the information from such a test, for example that the unit root
is not rejected, does not tell you which transformation is appropriate, just that
there is one that will result in H0 not being rejected.
Some unit root tests based on ranks are considered in Section 2.3. An obvious
starting point is the DF test modified to apply to the ranks of the series. The
testing framework is to first rank the observations in the sample, but then use
an AR/ADF model on the ranked observations. It turns out that this simple
extension is quite hard to beat in terms of size retention and power. Another
possibility is to take the parametric score or LM unit root test (see Schmidt and
Phillips, 1992), and apply it to the ranks. This test deals with the problem of
weakly dependent errors in the manner of the PP tests by employing a semi-
parametric estimator of the long-run variance, although an AR-based estimator
could also be used.
The rank ADF test and the rank-score test take parametric tests and modify
them for use with the ranked observations. A different approach is to construct a
nonparametric test without reference to an existing parametric test. One exam-
ple of this approach, outlined in Section 2.4, is the range unit root test, which is
based on assessing whether the marginal sample observation represents a new
record in the sense of being larger than the existing maximum or smaller than
the existing minimum, the argument being that stationary and nonstationary
series add new records at different rates, which will serve to distinguish the two
processes.
Finally, Section 2.5 considers a test based on the variance-ratio principle. In
this case, the variance of yt generated by a nonstationary process with a unit
root grows at a different rate compared to when yt is generated by a stationary
process, and a test can be designed around this feature. The test described, due
to Breitung (2002), which is particularly simple to construct, has the advantage
that it does not require specification of the short-run dynamics, and in that
sense it is ‘model free’.
Some simulation results and illustrations of the tests in use are presented in
Section 2.6. The emphasis in the simulation results is on size retention and
relative power of the different tests; however, the advantages of nonparametric
and semi-parametric tests are potentially much wider. In different degrees they
are derived under weaker distributional assumptions than parametric tests and
are invariant to some forms of structural breaks and robust to misspecification
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
of generally unknown short-run dynamics.
wt = wt−1 + εt (2.1)
wt = β1 + wt−1 + εt (2.2)
where εt ∼ iid(0, σε2 ), σε2 < ∞. The two cases considered are the no transformation
case, wt = yt , and the log transformation case, wt = ln yt . In the first case, yt
follows a random walk (RW) in the natural units of yt , and in the second case yt
follows an exponential random walk (ERW), which is a log-random walk; thus,
in the second case, the generating process is, first without drift:
ln yt = ln yt−1 + εt (2.3)
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(ε1 +···+εt )
yt = y 0 e y0 > 0 (2.4b)
ln yt = β1 + ln yt−1 + εt (2.5)
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
yt > 0 for all t in the sample, then KD (op. cit., Theorem 2) show that both
as σε2 → 0 and as σε2 → ∞, then ρ̂ ⇒ 1, implying P(δ̂ < δα |H0 ) → 0; that is, the
rejection probability, conditional on yt > 0, tends to zero, which is the limit to
under-rejection. The simulation results reported below confirm this result for δ̂,
but indicate that it does not apply generally when there is a constant and/or a
trend in the maintained regression.
In Case 2, that is the random walk is in logs, wt = ln yt , but the test is applied to
the levels, yt , the rejection probabilities increase with σε2 and T; under-rejection
occurs only for small sample sizes or for very small values of the innovation
variance, for example σε2 = 0.01 (see KD, op. cit., theorem 1). Thus, generally,
the problem is of over-rejection. These results are confirmed below and do carry
across to the maintained regression with either a constant and/or a trend and
to the pseudo-t tests.
Case 1
KD (op. cit.) reported that the rejection probabilities for δ̂ in Case 1 were zero
for all design parameters tried (for example, variations in σε2 , T and starting
values, and introducing drift in the DGP), and this result was confirmed in
the simulations reported in Table 2.1, which are based on 50,000 replications.
However, some qualifications are required that limit the generality of this result.
First, τ̂ was under-sized, but at around 2.6%–2.8% rather than 0%, for a nominal
5% test. The biggest difference arises when the maintained regression includes
Table 2.1 Levels or logs: incorrect specification of the maintained regression, impact on
rejection rates of DF test statistics (at a 5% nominal level)
T = 100
σε2 = 0.5 0.0% 4.8% 4.5% 2.7% 5.2% 4.2%
σε2 = 1.0 0.0% 5.2% 4.8% 2.6% 5.3% 4.7%
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
T = 200
σε2 = 0.5 0.0% 5.7% 5.6% 2.8% 5.3% 4.6%
σε2 = 1.0 0.0% 5.4% 5.2% 2.6% 5.1% 4.3%
Notes: Case 1 is where the random walk is in ‘natural’ units, wt = yt , but the testing takes place using
ln yt ; Case 2 is where there is an exponential random walk, so the correct transformation is wt = ln yt , but
the testing takes place using yt . The test statistics are of the general form δ̂ = T(ρ̂ − 1) and τ̂ = (ρ̂ − 1)/σ̂(ρ̂),
where σ̂(ρ̂) is the estimated standard error of ρ̂; a subscript indicates the deterministic terms included in
the maintained regression, with μ for a constant and β for a constant and linear trend. The maintained
regressions are deliberately misspecified; for example in Case 1, the maintained regression is specified
in terms of ln yt and ρ̂ is the estimated coefficient on ln yt−1 and in Case 2 the maintained regression is
specified in terms of yt and ρ̂ is the estimated coefficient on yt−1 .
a constant and/or a trend. In these cases, both the n-bias (δ̂) and pseudo-t tests
(τ̂) maintain their size.
Case 2
The results in KD (op. cit.) are confirmed for δ̂ in Case 2. Further, the over-
rejection is found to be more general, applying to τ̂, and to the δ̂-type and
τ̂-type tests when the maintained regression includes a constant and/or a trend;
however, the over-rejection tends to decline as more (superfluous) deterministic
terms are included in the maintained regression. For example, the rejection rates
for σε2 = 1.0 and T = 100 are as follows: τ̂, 93.1%; τ̂μ , 89.2%; and τ̂β , 84.1%. Also
the rejection rates tend to increase as σε2 increases.
The over-rejection in Case 2 is sometimes looked on as an incorrect decision –
there is a unit root so it might be thought that the test should not reject; how-
ever, it is the correct decision in the sense that the unit root is not in the levels
of the series, it is in the logs. Failure to reject could be more worrying as mod-
elling might then proceed incorrectly in the levels, whereas it should be in the
logarithms.
Kobayashi and McAleer (1999a), hereafter KM, have suggested a test to discrim-
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
inate between an integrated process defined alternately on the levels or the
logarithms of the data. In each testing situation, the asymptotic distribution
of the test statistic depends on whether there is drift in the unit root process.
Such tests are likely to be of particular use when a standard unit root test, such
as one of the DF tests, indicates that the series both in the levels and in the
logarithms is I(1). In this case, the tests are not sufficiently discriminating and
the appropriate KM test is likely to be useful.
wt = β1 + wt−1 + zt (2.7a)
p
ψ(L)zt = εt ψ(L) = 1 − ψi Li (2.7b)
i=1
⇒
p
1− ψi Li (wt − β1 ) = εt (2.8)
i=1
If the data are generated by a linear integrated process, LIP, with drift, then
wt = yt ; whereas if the data are generated by a log-linear integrated process,
LLIP, with drift, then wt = ln(yt ). In both cases KM assume that εt ∼ iid(0, σε2 ),
E(ε3t ) = 0, E(ε4t ) < ∞, and the roots of ψ(L) are outside the unit circle (see UR Vol.1,
Appendix 2). The drift parameter β1 is assumed positive, otherwise negative
values of yt are possible, thus precluding a log-linear transformation. εt and the
β̂∗1
β̂1 = (2.9c)
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ψ̂(1)
1 T
σ̂ε2 = ε̂2 (2.9d)
T − (p + 1) t=p+1 t
Note that when testing the LIP against the LLIP, the estimator and residuals are
defined in terms of wt = yt ; whereas when testing the LLIP against the LIP, the
estimator and residuals are defined in terms of wt = ln yt . The following test
statistics, derived by KM (op. cit.), make the appropriate substitutions.
1.i V1 : LIP with drift is the null model: test for departure in favour of the LLIP;
1.ii U1 : LIP without drift is the null model: test for departure in favour of the
LLIP;
2.i V2 : LLIP with drift is the null model: test for departure in favour of the LIP;
2.ii U2 : LLIP without drift is the null model: test for departure in favour of the
LIP.
1.i The first case to consider is when the null model is the LIP with drift, then:
T
T−3/2 yt−1 (ε2t − σε2 ) ⇒D N(0, ω2 ) (2.11)
t=p+1
β21 4
ω2 = σ
6 ε
⇒ Test statistic, V1 :
−3/2 1 T 2 2
V1 ≡ T yt−1 (ε̂t − σ̂ε ) ⇒D N(0, 1) (2.12)
ω̂ t=p+1
1/2
β̂21 4
ω̂ = σ̂
6 ε
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Comment: when there is drift, then with an appropriate standardisation, the
test statistic is asymptotically normally distributed with zero mean and unit
variance, and the sample value is compared to a selected upper quantile of N(0,
1), for example the 95% quantile; large values lead to rejection of the null of a
LIP with drift.
1.ii The next case is when the null model is the LIP without drift, then:
T 1 1 1
T−1 yt−1 (ε2t − σε2 ) ⇒D ω B1 (r)dB2 (r) − B1 (r)dr dB2 (r)
t=p+1 0 0 0
√ σε3
ω= 2 (2.13)
ψ(1)
⇒ Test statistic, U1 :
1 1 1 1
U1 = T−1 yt−1 (ε̂2t − σ̂ε2 ) ⇒D B1 (r)dB2 (r) − B1 (r)dr dB2 (r)
ω̂ 0 0 0
√ σ̂ε3
ω̂ = 2 (2.14)
ψ̂(1)
where B1 (r) and B2 (r) are two standard Brownian motion processes with zero
covariance.
Comment: when drift is absent, the test statistic has a non-standard distri-
bution; the null hypothesis (of a LIP) is rejected when the sample value of U1
exceeds the selected (upper) quantile; some quantiles are provided in Table 2.2
below.
2.i. The second case to consider is when the null model is the LLIP with drift,
then:
T
T−3/2 − ln yt−1 (ε2t − σε2 ) ⇒D N(0, ω2 ) (2.15)
t=p+1
2 β21 4
ω = σ
6 ε
⇒ Test statistic, V2 :
1 T
V2 ≡ T−3/2 − ln yt−1 (ε̂2t − σ̂ε2 ) ⇒D N(0, 1) (2.16)
ω̂ t=p+1
1/2
β̂21 4
ω̂ = σ̂
6 ε
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
as standard normal, and the sample value is compared with a selected upper
quantile of N(0, 1); large values lead to rejection of the null of a LLIP with drift.
2.ii. The next case is when the null model is the LLIP without drift:
T 1 1 1
T−1 − ln yt−1 (ε2t − σε2 ) ⇒D − B1 (r)dB2 (r) − B1 (r)dr dB2 (r)
t=p+1 0 0 0
√ σϕ3
= 2 (2.17)
ψ(1)
⇒ Test statistic, U2 :
1 1 1
1
U2 = T−1 yt−1 (ε̂2t − σ̂ε2 ) ⇒D − B1 (r)dB2 (r) − B1 (r)dr dB2 (r)
ˆ 0 0 0
√ σ̂ε3
ˆ = 2 (2.18)
ψ̂(1)
where W1 (r) and W2 (r) are two Brownian motion processes with zero
covariance.
Comment: as in the LIP case, ‘large’ values of U2 lead to rejection of the null,
in this case that the DGP is a LLIP.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(9.2) (9.6) (4.5) (4.6) (0.8) (0.9)
1,000 1.32 1.28 1.65 1.62 2.36 2.30
(10.7) (10.0) (5.0) (4.7) (1.1) (0.9)
∞ 1.282 1.282 1.645 1.645 2.326 2.326
(10.0) (10.0) (5.0) (5.0) (1.0) (1.0)
Notes: Table entries are quantiles, with corresponding size in (.) brackets, where size is the empirical
size using quantiles from N (0, 1) for Table 2.2a and from the quantiles for T = ∞ for Table 2.2b; the
latter are from KM (op. cit., table 1). Results are based on 20,000 replications. A realised value of the
test statistic exceeding the (1 – α)% quantile leads to rejection of the null model at the α% significance
level.
The likely numerical magnitude of the drift will depend on whether the inte-
grated process is linear or loglinear and on the frequency of the data. In Table
2.2a, the critical values were presented for β1 = σε , which is 1 in units of the
innovation standard error (β1 /σε = 1). Table 2.3 considers the sensitivity of these
results to the magnitude of the drift for the case T = 100 and κ = β1 /σε ; as in
Table 2.2a the results are in the form of quantile with size in (.) parentheses,
where the size is the empirical size if the quantiles from N(0, 1) are used.
The results in Table 2.3 show that apart from small values of κ, for example
κ = 0.1, the quantiles are robust to variations in κ, but there is a slight under-
sizing throughout for ψ1 = 0.0; the actual size of a test at the nominal 5% level
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
κ = 2.0 1.22 1.24 1.57 1.61 2.24 2.27
(9.2) (9.2) (4.1) (4.7) (0.8) (0.8)
κ = 3.0 1.21 1.29 1.57 1.62 2.19 2.21
(8.9) (10.3) (4.2) (4.8) (0.8) (0.6)
κ = 4.0 1.22 1.29 1.57 1.63 2.26 2.29
(8.9) (10.3) (4.2) (4.8) (0.8) (0.9)
N(0, 1) 1.282 1.282 1.645 1.645 2.326 2.326
(10.0) (10.0) (5.0) (5.0) (1.0) (1.0)
being between about 4% and 4.5%. However, the empirical size is closer to the
nominal size for ψ1 = 0.8. Overall, whilst there is some sensitivity to sample size
and the magnitudes of ψ1 and β1 , the quantiles are reasonably robust to likely
variations.
2.2.2.iii Motivation
Whilst four test statistics were outlined in the previous section, they are moti-
vated by similar considerations. In general, fitting the wrong model will induce
heteroscedasticity, which is detected by assessing the (positive) correlation
between the squared residuals from the fitted model and yt−1 in the case of
the LIP and − log yt−1 in the case of the LLIP.
To illustrate, consider the case where the DGP is the exponential random walk,
ERW, which is the simplest LLIP, but the analysis is pursued in levels; further,
assume initially that zt = εt . Then:
yt
yt = − 1 yt−1 (2.19)
yt−1
= ηyt−1 (2.20)
where η = y t − 1 = (e(εt +β1 ) − 1).
y
t−1
Given that εt and yt−1 are uncorrelated, the conditional variance of yt given
yt−1 is proportional to y2t−1 when the DGP is the ERW, implying that y2t and
y2t−1 are positively correlated. However, when the DGP is a LIP, there is no cor-
relation between y2t and y2t−1 ; thus, the correlation coefficient between y2t
and y2t−1 provides a diagnostic test statistic when the linear model is fitted. KM
(op. cit.) find that the asymptotic distribution is better approximated when the
test statistic is taken as the correlation coefficient between y2t and yt−1 , with
large values leading to rejection of the linear model. When zt follows an AR pro-
cess, then ε2t , or ε̂2t in practice, replaces y2t , which should be uncorrelated
with yt−1 under the null of the LIP.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
is based on the ‘with drift’ or ‘without drift’ model. Some of the results from
the power simulations in KM (op. cit.) are summarised in Table 2.4. Evidently
the test when there is drift is considerably more powerful than in the without
drift case. Even with T = 100, power is 0.96 when the fitted model is linear,
but the DGP is actually log-linear, and 0.81 when the fitted model is log-linear,
but the DGP is actually linear; however, in the absence of drift, these values
drop to 13% and 12% in the former case and only 3% and 4% in the latter case,
respectively. This suggests caution in the use of the non-drifted tests, which
require quite substantial sample sizes to achieve decent power. However, KM
(op. cit.) found that their tests are more powerful, sometimes considerably so,
than an alternative test for heteroscedasticity, in this case the well-known LM
test for ARCH (see Kobayashi and McAleer, 1999b) for a power comparison of a
number of tests.
2.2.3 Illustrations
As noted, the KM non-nested tests are likely to be of greatest interest when a
standard unit root test is unable to discriminate between which of two integrated
processes generated the data, finding that both the levels and the logarithms
support non-rejection of the unit root. An important prior aspect of the use of
the tests is whether the integrated process is with or without drift. Also, since
the issue is whether the unit root is in either the levels or the logarithms, the
data should be everywhere positive. The economic context is thus likely to be of
importance, for example the default for aggregate expenditure variables, such
as consumption, GDP, imports and so on, is that if generated by an integrated
process, there is positive drift; whereas series such as unemployment rates are
unlikely to be generated by a drifted IP.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
daily London Fix prices for the period 2nd January, 1985 to 31st March 2006;
weekends and bank holidays are excluded, giving an overall total of 5,372 obser-
vations. This series, referred to as yt is obviously everywhere positive, so there
is an issue to be decided about whether it is generated by an LIP or an LLIP.
In neither case is there evidence of drift (see Figure 2.1, left-hand panel for the
‘levels’ data and right-hand panel for the log data), and the appropriate KM tests
are, therefore, U1 and U2 , that is the driftless non-nested tests. The levels series
has a maximum of 100.8 and a minimum of 38.3, and it crosses the sample
mean of 68.5, 102 times, that is just 1.9% of the sample observations, which
is suggestive of random walk-type behaviour. The size of the sample provides
a safeguard against the low power indicated by the simulation results for the
driftless tests.
For illustrative purposes, the standard ADF test τ̂μ was used. The maximum
lag length was set at 20, but in both the linear and loglinear versions of the
ADF regressions BIC selected ADF(1); in any case, longer lags made virtually
no difference to the value of τ̂μ , which was –2.433 for the LIP and –2.553 for
the LLIP, both leading to non-rejection of the null hypothesis at conventional
significance levels, for example, the 5% critical value is –2.86. For details of the
regressions, see Table 2.5.
As an indicator of the likely result of the KM tests, note that the correla-
tion coefficient between the squared residuals and yt−1 in the linear case is
0.004, indicating no misspecification, whereas for the log-linear case the cor-
relation coefficient between the squared residuals and − ln yt−1 is 0.107. The
latter correlation suggests that if a unit root is present it is in the linear model,
which is confirmed by the sample value of U1 = 0.097, which is not significant,
whereas U2 = 4.57, which is well to the right of the 99th percentile of 1.116, see
Table 2.2b.
110 5
levels logs
100 4.8
90 4.6
80 4.4
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
70 4.2
60 4
50 3.8
40 3.6
30 3.4
1985 1990 1995 2000 2005 1985 1990 1995 2000 2005
V2 , which allow for drift; and the DF pseudo-t test is now τ̂β , which includes a
time trend in the maintained regression to allow the alternative hypothesis to
generate a trended series.
Whilst, in this example, BIC suggested an ADF(1) regression, the marginal-t
test criterion indicated that higher order lags were important and, as the sample
is much smaller than in the first illustration, an ADF(5) was selected as the test
model. As it happens, the KM test statistics were, however, virtually unchanged
x 104
7.5
11.1
6.5
11.05
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
11
6
10.95
10.9
5.5
10.85
5 10.8
10.75
4.5 10.7
1980 1990 2000 1980 1990 2000
between the two specifications and the conclusions were not materially different
for the different lag lengths. As in the first illustration, the DF test is unable to
discriminate between the linear and log specification, with τ̂β = −1.705 and τ̂β =
−1.867, respectively, neither of which lead to rejection of the null hypothesis,
with a 5% critical value of –3.41; see Table 2.6 for the regression details.
The correlation coefficient between the squared residuals and yt−1 in the linear
case is –0.094, indicating no misspecification in the log-linear direction, whereas
for the log-linear case the correlation coefficient between the squared residuals
and − ln yt−1 is 0.132. The KM test values are V1 = –4.77, which is wrong-signed
for rejection of the null hypothesis, and V2 = 15.83, which has a p-value of zero
under the null. Hence, if there is an integrated process, then the conclusion is
in favour of a linear model rather than a log-linear model.
A problem with applying standard parametric tests for a unit root is that the
distribution of the test statistic under the null and alternative is not generally
invariant to a monotonic transformation of the data. This was illustrated in
Section 2.1 for the DF tests for the case of loglinear transformations, although
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Table 2.6 Regression details: world oil production
KM tests
Null, linear; V1 = −4.77 95% quantile = 1.645
Alternative,loglinear
Null, log-linear; V2 = 15.83 95% quantile = 1.645
Alternative, linear
it holds more generally (see GH, 1991). Basing a unit root test on ranks avoids
this problem.
The situation to be considered is that the correct form of the generating pro-
cess is not known other than, say, it is a monotonic and, hence, order-preserving
function of the data yt ; thus, as before, let yt be the ‘raw’ data and let wt = f(yt )
be some transformation of the data, such that wt is generated by an autoregres-
sive process. The linear versus log-linear choice was a special case of the more
general situation. In the (simplest) first order case, the generating process is:
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
wt = ρwt−1 + εt εt ∼ iid(0, σε2 ), σε2 < ∞ (2.21)
The parameter ρ is identified and, hence, we can form the null hypothesis, H0 :
there exists a monotonic function wt = f(yt ) such that ρ = 1 in wt = ρwt−1 + εt .
A ranks-based test proceeds as follows. Consider the values of Y = {yt }Tt=1
ordered into a set (sequence) such that Y(r) = {y(1) < y(2) . . . < y(T) }; then y(1) and
y(T) are the minimum and maximum values in the complete set of T observa-
tions. To find the rank, rt , of yt , first obtain its place in the ordered set Y(r) , then
its rank is just the subscript value; the set of ranks is denoted r. For example,
if {yt }Tt=1 = {y1 = 3, y2 = 2, y3 = 4}, then Y(r) = {y(1) = y2 , y(2) = y1 , y(3) = y3 },
and the ranks are r = {rt }3t=1 = {2, 1, 3}. Most econometric software contains a
routine to sort and rank a set of observations.
Rt = ρr Rt−1 + νt (2.22)
γ̂r
τ̂r,μ =
σ̂(γ̂r )
(ρ̂r − 1)
= (2.25)
σ̂(ρ̂r )
where ρ̂r is the LS estimator based on the ranks and the testing procedure is
otherwise as in the standard case.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
and τ̂r,β .
In the event that the νt are serially correlated, GH suggested a rank extension
of the augmented DF test, say RADF, in which the maintained regression (here
using the centred ranks) is:
k−1
Rt = γr Rt−1 + cr,j Rt−j + νt (2.26)
j=1
Although this version of a rank test has not been justified formally, the simula-
tion results reported in Section 2.6, show that the RADF test, with the simulated
quantiles for the DF rank test, does perform well; see also Fotopoulus and
Ahn (2003). By way of comparison, the rank-score test due to Breitung and
Gouriéroux (1997), hereafter BG, considered in the next Section, 2.3.2, does
allow explicitly for serially correlated errors.
BG (op.cit.) show that the asymptotic distribution of τ̂r,μ is not the same as
the corresponding parametric test, the key difference being that T−1 r[rT] does
not converge to a Brownian motion, where T is the sample size, 0 ≤ r ≤ 1 and
[.] represents the integer part. Fotopoulus and Ahn (2003) extend these results
to obtain the relevant asymptotic distributions. If the standard DF quantiles are
used, then the rank tests tend to be oversized. The extent of the over-sizing is
illustrated in Table 2.7, which reports the results of a small simulation study
where the innovations are drawn from one of N(0, 1) and t(2). Although the
wrong sizing is not great, for example 6.1% for a nominal 5% test with T = 200,
the over-sizing does not decline with the sample size.
Table 2.7 Size of δ̂r,μ and τ̂r,μ using quantiles for δ̂μ and τ̂μ
Source: results are for 50,000 replications of a driftless random walk; the δ̂μ and
τ̂μ tests are from a DF regression with an intercept; and δ̂r,μ and τ̂r,μ use the
mean-adjusted ranks.
δ̂r,μ τ̂r,μ
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
δ̂r,β τ̂r,β
T 100 200 500 100 200 500
1% –31.30 –32.67 –34.14 –4.20 –4.20 –4.20
5% –23.17 –24.53 –25.68 –3.60 –3.60 –3.63
10% –19.80 –20.85 –21.85 –3.29 –3.30 –3.35
Rather than use the percentiles for the standard DF tests, the percentiles can
be simulated; for example, some critical values for δ̂r,μ , τ̂r,μ , δ̂r,β and τ̂r,β for
T = 100, 200 and 500 are provided in Table 2.8. The maintained regression for
these simulations used the mean-adjusted ranks, with 50,000 replications of the
null of a driftless random walk. Other sources for percentiles are Fotopoulus and
Ahn (2003), who do not use mean-adjusted ranks, and GH (1991), who include a
constant in the maintained regression rather than use the mean-adjusted ranks.
As in the case of the standard DF tests, a sample value that is larger negative
than the critical value leads to rejection of the null of a unit root in the data.
More precisely, rejection of the null carries the implication that, subject to the
limitations of the test, there is not an order preserving transformation of the data
that provides support for the null hypothesis. Non-rejection implies that there
is such a transformation (or trivially that no transformation is necessary) that
is consistent with a unit root AR(1) process; but it does not indicate what is the
transformation, which has to be the subject of further study. Other rank-based
tests are considered in the next sections.
H0 : yt = β1 + εt (2.27)
HA : (1 − ρL)(yt − β0 − β1 t) = εt |ρ| < 1 (2.28)
where εt ∼ iid and E(εt ) = 0, with common cdf denoted F(εt ). The aim is to
obtain a test statistic for H0 that can be extended to the case where the sequence
{εt } comprises serially correlated elements. It is convenient here to adopt the
notation that the sequence starts at t = 0, thus {εt }t=T
t=0 .
The test statistic is an application of the score, or LM, principle to ranks. In
the parametric case, the test statistic for H0 is:
T
xt St−1
Sβ = t=2
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
T
(2.29)
2
t=2 St−1
where xt = yt − β̂1 , which is the demeaned data under the null, where β̂1 is the
sample mean of yt , that is (yT − y0 )/T; and St is the partial sum of xt , given by
St = tj=1 xj .
Analogously, the rank version of this test is:
T R
t=2 r̃t St−1
Sr,β = T (2.30)
t=2 (St−1 )
R 2
where r̃t is the (demeaned) rank of yt among (y1 , . . . , yT ), that is, the order-
ing is now with respect to yt , such that y(1) < y(2) < . . . < y(T) . The ordering
is with respect to yt rather than yt , as under the LM principle the null of a unit
root is imposed. A β subscript indicates that the null and alternative hypotheses
allow trended behaviour in the time series either due to drift under H0 or a trend
under HA .
SR is the partial sum to time t of the (demeaned) ranks of yt , that is SRt =
t t
j=1 r̃j . The ranks of the differences are demeaned by subtracting (T + 1)/2, or
T/2 if the convention is that t = 1 is the start of the process. That is, let R̃(yt )
be the rank of yt among (y1 , . . . , yT ), then define:
T+1
r̃t ≡ R̃(yt ) − demeaned ranks (2.31)
2
r̃t
r̃tn ≡ (2.32)
T
BG (op. cit.) show that the numerator of Sr,β is a function of T only, in which
case a test statistic can be based just on the denominator, with their test statistic
a function of λ, where:
T
λ= (SRt )2 (2.33)
t=1
This expression just differs from the denominator of Sr,β by the inclusion
of (SRT )2 .
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
where r̃tn is given by (2.32). The distributional results are based on the following
theorem (BG, op. cit., Theorem and Remark A):
1
λr,β ⇒D (B(r) − rB(1))2 dr (2.35)
0
Finite sample critical values are given in BG (op. cit.), some of which are
extracted into Table 2.9 below. The test is ‘left-sided’ in the sense that a sam-
ple value less than the α% quantile implies rejection of the null hypothesis. The
null distribution of λr,β is independent of F(εt ), both asymptotically and in finite
samples; and the limiting null distribution is invariant to a break in the drift
during the sample, that is, if, under the null, β1 shifts to β1 + δ at some point Tb .
Referring back to H0 of (2.27), a special case arises if β1 = 0, corresponding
to the random walk without drift, yt = εt ; however as λr,β is a test statistic
based on the ranks of yt , it is invariant to monotonic transformations of yt ,
a simple example being when a constant is added to each observation in the
series, so the same test statistic would result under this special null.
Variations on the test statistic λr,β are possible, which may improve its small
sample performance. BG (op. cit.) suggest a transformation of the ranks using
the inverse standard normal distribution function, which can improve power
when the distribution of εt is not normal (see BG, op. cit., table 2). A nonlinear
transformation is applied to the normalised ranks r̃tn using the inverse of the
standard normal distribution function (.) and the revised ranks are:
The test statistic, with this variation, using r̃tins rather than r̃tn , is referred to as
(ins) (ins)
λr,β . Some quantiles for λr,β and λr,β are given in Table 2.9.
T = 100 T = 250 T =∞
(ins) (ins) (ins)
λr,β λr,β λr,β λr,β λr,β λr,β
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Source: extracted from BG (op. cit., table 6).
First, define the long-run variance in the usual way (see also UR Vol. 1, chapter
6), but applied to the ranks:
2
σr,lr = limT→∞ E[T−1 (SRT )2 ]
2
T
= limT→∞ E[T−1 r̃j ] (2.37)
j=1
1 1
λr,β ⇒D (B(r) − rB(1))2 dr (2.38)
σr,lr
2
0
This is a semi-parametric form in keeping with the intention of the test not to
involve a parametric assumption, such as would be involved in using an AR
2 . Provided M/T → 0 and M → ∞ as T → ∞, then
approximation to obtain σ̃r,lr
2 is consistent for σ2 . The scaling factor 12 outside the brackets in (2.39)
σ̃r,lr r,lr
arises because if the errors are iid, then the limiting value of the first term in the
brackets is 12−1 as T → ∞; see BG (op. cit., p.16). This reflects the normalisation
2 = 1.
that if the errors are iid, then σr,lr
To ensure positivity of the estimator, a kernel function ωM (κ) can be used,
resulting in the revised estimator, σ̃r,lr
2 :
⎛ ⎞
T
M
T
σ̃r,lr (κ) = 12 ⎝
2 2
ν̃t /T + 2 ωM (κ) ν̃t−κ ν̃t /T⎠ (2.40)
t=1 k=1 t=κ+1
The frequently applied Newey-West kernel function uses the Bartlett weights
ωM (κ) = 1 − κ/(M + 1) for κ = 1, ..., M.
The revised test statistic, with the same asymptotic distribution as λr,β , is:
1
λlr,β = λr,β (2.41)
σ̃r,lr
2
An obvious extension gives rise to the version of the test that transforms the
(ins)
ranks using the inverse standard normal distribution and is referred to as λlr,β .
Some asymptotic critical values are given in the last column (T = ∞) of Table 2.7.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2.3.2.ii Simulation results
Some results from Monte-Carlo simulations in BG (op. cit.) are summarised here,
taking T = 100 as illustrative of the general results. First, with wt = yt , so that the
data are not transformed, and the εt are drawn alternately from N(0, 1), t(2) and
the centred χ2 (2) distribution. Both the rank test statistic λr,β and the DF test
statistic τ̂β maintain their size well under the different distributions for εt , with τ̂β
generally more powerful than λr,β , especially for non-normal innovations. The
advantage of λr,β is apparent when an additive outlier (AO) of 5σ contaminates
the innovations at T/2. In this case λr,β maintains its size, whereas τ̂β becomes
over-sized, for example an empirical size of 17.4% for a nominal 5% size.
The AO situation apart, there is no consistent advantage in the rank-score test
when it is applied to wt = yt , which is not surprising given its nonparametric
nature. More promising is the case when the random walk is generated in a
nonlinear transformation of the data, so that wt = f(yt ), with f(.) alternately
1/3
given by f(yt ) = y3t , f(yt ) = yt , f(yt ) = ln yt and f(yt ) = tan(yt ), and the random
walk, without drift, is in wt , that is the random walk DGP is in the transformed
data. In these cases, λr,β maintains its size, whereas τ̂β is seriously over-sized and
(ins)
the λr,β version of the test becomes moderately over-sized.
The case where wt = ln(yt ), but the test uses yt , was considered in Section 2.1,
(referred to as Case 2), where it was noted that the rejection rate was typically
of the order of 80–90% using τ̂μ at a nominal 5% size. As to the rank-score tests,
(ins)
BG (op. cit.) report that λr,β and λr,β have empirical sizes of 5.1% and 8.2%,
respectively. The problematic case of MA(1) errors was also considered and it also
(ins)
remains a problem for the rank-score tests λr,β and λr,β , with serious under-
sizing when the MA(1) coefficient is positive and serious over-sizing when the
MA(1) coefficient is negative.
Illustrations of the rank-score tests and other tests introduced below are given
in Section 2.6.3.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
2.4.1 The range and new ‘records’
Let the length of the sequence, i, vary from i = 2, …, T, where T is the overall
sample size. For i = 2, y(1) < y(2) (for simplicity equality is ruled out); now
consider adding a 3rd observation, y3 , then there are three possibilities: Case 1,
y(1) < y3 < y(2) ; Case 2, y3 < y(1) < y(2) ; Case 3, y(1) < y(2) < y3 . In Case 1, y(1) is
still the minimum value and y(2) is still the maximum value; thus, the extremes
are unchanged and, therefore, the range remains unchanged; in Case 2, y3 is
the new minimum; whereas in Case 3, y3 is the new maximum. In Cases 2 and
3, the range has changed and, therefore, RiF > 0. In the terminology associated
with this test, in Cases 2 and 3 there is a new ‘record’.
Next, define an indicator variable as a function of RiF so that it counts the
number of new records as the index i runs from 1 to T. The number of these new
records in a sample of size T is: Ti=1 I(RiF > 0), where I(RiF > 0) = 0 if RiF = 0
and I(RiF > 0) = 1 if RiF > 0. (For simplicity, henceforth, when the nature of
the condition is clear then the indicator variable is written as I(RiF ).) Thus, in
the simple illustration above, in Case 1 R3F = 0, whereas in Cases 2 and 3 R3F
> 0. The average number of new records is R̄F = T−1 Ti=1 I(RiF ). The intuition
behind forming a test statistic based on the range is that the number of new
records declines faster for a stationary series (detrended if necessary) compared
to a series that is nonstationary because of a unit root. The test described here,
referred to as the range unit root, or RUR, test is due to AES (op.cit.).
It is intuitively obvious that a new record for yt is also a new record for a
monotonic transformation of yt , including nonlinear transformations, and this
is the case under the null and the stationary alternative. Simulations reported
in AES (op. cit., especially table IV) show that the nominal size is very much
better maintained for the RUR test compared to the DF t-type test for a wide
range of monotonic nonlinear transformations, typically 3–5% for a nominal
5% test when T ≤ 250, and is virtually the same for T = 500.
Thus far the implicit background has been of distinguishing a driftless random
walk from a trendless stationary process. However, care has to be taken when
dealing with DGPs that involve trending behaviour, since the number of new
records will increase with the trend. The alternatives in this case are a series non-
stationary by virtue of a unit root, but with drift, and a series stationary around
a deterministic (linear) time trend: both will generate trending behaviour. The
trending case is dealt with after the trendless case; see Section 2.4.4.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
this could be the logarithm, or some other monotonic transformation, of the
original series. Otherwise the set-up is standard, with the null hypothesis (in this
section) that yt is generated by a driftless random walk, yt = yt−1 + zt , where
zt = εt , and {εt }Tt=1 is an iid sequence of random variables with E(εt ) = 0 and
variance σε2 .
As a preliminary to the test statistic, consider the properties of a function
of the sample size, say (T), such that the following condition is (minimally)
satisfied:
T
(T) I(RiF > 0) = Op (1) (2.42)
i=1
Then for stationary series, (T) = ln(T)−1 ; for I(1) series without drift, (T) =
√
1/ T; and for I(1) series with drift, (T) = 1/T. (The first of these results requires
a condition that the i-th autocovariance decreases to zero faster than ln(T)−1 ; see
AES (op. cit.), known as the Berman condition, which is satisfied for Gaussian
ARMA processes.)
Thus, a test statistic based on the range for an I(1) series without drift can be
formed as:
T
F
RUR = T−1/2 I(RiF )
i=1
Given the rate of decrease of new records, RUR F will tend to zero in probability
for a stationary time series and to a non-degenerate random variable for an I(1)
series. Thus, relatively, RURF will take large values for an I(1) series and small
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Source: based on AES (op. cit., table I); 10,000 replications of
the null model with εt ∼ niid(0, 1).
2 1 2
F
RUR ⇒D (ξ + η)2 e− 2 (ξ +η) (2.44)
π
where ξ →p B̄(1), η →p LB (0, 1) where the latter is the local time at zero of a
Brownian motion in [0, 1]; see AES (op. cit., Definition 1 and Appendix A2).
Critically, AES (op. cit., Theorem 1, part (3)) show that if yt is a stationary
Gaussian series (with covariance sequence satisfying the Berman condition),
then as T → ∞:
F
RUR →p 0
AES show that the finite sample distribution for T = 1, 000, under the null yt =
yt−1 + εt , with εt ∼ niid(0, 1), t ≥ 1, is very close to the asymptotic distribution
of their Theorem 1, especially in the left-tail. Some critical values are reported
in Table 2.10; this table also includes the quantiles for 90% and 95% as they are
required for the situation when the test is used to distinguish between trending
alternatives.
F
2.4.1.ii Robustness of RUR
F is robust to a number of important departures from
AES (op. cit.) show that RUR
standard assumptions. In particular it is robust to the following problems.
i) Departures from niid errors (which is what the finite sample, simulated crit-
ical values are based upon); for example, draws from fat-tailed distributions
such as the ‘t’ distribution and the Cauchy distribution, and asymmetric
distributions (see AES, op. cit., table III).
ii) Unlike the parametric DF-based tests, RUR F is reasonably robust to structural
breaks in the stationary alternative. The presence of such breaks may ‘con-
fuse’ standard parametric tests into assigning these to permanent effects,
whereas they are transitory, leading to spurious non-rejection of the unit
root null hypothesis. The cases considered by AES were: the single level shift,
that is where the alternative is of the form yt = ρyt−1 + δ1 Dt,1 + εt , |ρ| < 1 and
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Dt,1 = 0 for t ≤ tb,1 , Dt,1 = 1 for t > tb,1 , where tb,1 = T/2; and a multiple shift,
in this case where the alternative is of the form yt = ρyt−1 + 2i=1 δi Dt,i + εt ,
where |ρ| < 1 and Dt,i = 0 for t ≤ (T/4)i, Dt,i = 1 for (T/4)i < tb,i ≤ (T/2)i.
AES (op. cit.) found that: (a) RUR F outperformed the DF t-type test, which
had no power in most break scenarios; (b) but for the power of RUR F to be
maintained, compared to the no-break case, the sample size had to be quite
large, for example T = 500; (c) the power of RUR F deteriorated in the two-break
T
FB 1 T F B
RUR =√ I(Ri ) + I(Ri ) (2.45)
2T i=1 t=1
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
99% 5.303 5.545 5.692
where RiB counts the new records on the time-reversed series yBt = yT−t+1 ,
t = 1, 2, . . . , T. The revised test statistic RUR
FB has better size fidelity than R F for
UR
AOs in the null DGP early or late in the sample (but not both); it has better
power compared to RUR F against stationary alternatives with a single structural
Given the motivation for using a nonparametric test statistic, we consider the
sensitivity of the finite sample critical values to different error distributions. By
way of an illustration of robustness, consider the quantiles when {εt } is an iid
sequence and alternatively drawn from a normal distribution and the t distri-
bution with 3 degrees of freedom, t(3). The latter has substantially fatter tails
than the normal distribution. One question of practical interest is: what is the
actual size of the test if the quantiles assume εt ∼ niid(0, 1) but, in fact, they
are drawn from t(3)? In this case, and taking T = 100, 250, by way of example,
the actual size is then compared with the nominal size. By way of benchmark-
ing, the results are also reported for the DF t-type test statistics, τ̂μ and τ̂β . Some
of the results are summarised in Table 2.12, with QQ plots in Figures 2.3 and
2.4, the former for the DF tests and the latter for the range unit root tests. (The
QQ plots take the horizontal axis as the quantiles assuming εt ∼ niid(0, 1) and
the vertical axis as the quantiles realised from εt ∼ t(3); differences from the 45◦
line indicate a departure in the pairs of quantiles.)
There are four points to note about the results in Table 2.12 and the
illustrations in Figures 2.3 and 2.4.
i) The DF t-type tests are still quite accurate as far as nominal and actual size
are concerned; where there is a departure, it tends to be at the extremes of
the distribution (see Figure 2.3), which are not generally used for hypothesis
testing.
1% 5% 10%
T = 100
τ̂μ 1.08% 5.06% 10.32%
τ̂β 1.39% 5.03% 9.55%
F
RUR 1.85% 5.35% 10%
FB
RUR 1.22% 5% 10%
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
T = 250
τ̂μ 1.39% 5.43% 10.27%
τ̂β 1.07% 4.87% 9.32%
F
Rur 1.57% 6.16% 10.13%
FB
Rur 1% 5% 10%
2
DF: τμ
t(3) quantiles
–2
0
DF: τβ
t(3) quantiles
–2
–4
4
departure at
upper quantiles
2
T = 250
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
6
t(3) quantiles
RFB
ur : forward–backward test
2 T = 250
perfect alignment
0
1 2 3 4 5 6 7 8
niid quantiles
FB maintains its size quite generally throughout the range (see Figure 2.4,
iii) RUR
lower panel).
iv) Overall, although the differences among the test statistics are not marked,
FB is better at maintaining size compared to R F .
RUR UR
Further, reference to AES (op. cit., table III) indicates that there is a power gain
in using RURF , compared to a DF test, for a number of fat-tailed or asymmetric
distributions, for near-unit root alternatives, for example for ρ = 0.95 to 0.99
when T = 250.
Another practical issue related to robustness concerns the characteristics of
the RUR tests when the sequence of zt does not comprise iid random variables
(that is zt = εt . AES (op. cit.) report simulation results for size and power when
zt is generated by an MA(1) process: zt = εt + θ1 εt , with θ1 ∈ (0.5, –0.8). AES find
that, with a sample size of T = 100 and a 5% nominal size, both RUR F and R FB
UR
are (substantially) under-sized for θ1 = 0.5, for example, an actual size of 0.4%
F and 0.6% for R FB . In the case of negative values of θ , which is often
for RUR ur 1
the case of greater interest, RUR F and R FB are over-sized, which is the usual effect
UR
on unit root tests in this case, but RUR FB performs better than R F and is only
UR
moderately over-sized; for example, when θ1 = −0.6, the empirical sizes of RUR F
and RURFB are 20% and 9%, respectively. However, the DF t-type test, with lag
selection by MAIC, due to Ng and Perron (2001) (see UR Vol. 1, chapter 9 for
F and R FB in terms of size fidelity.
details of MAIC) outperforms both RUR UR
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
describes the procedure suggested by AES (op. cit.) for such situations.
The stochastically trended and deterministically trended alternatives are,
resepectively:
yt = β1 + yt−1 + εt β1 = 0 (2.46)
(yt − β0 − β1 t) = zt zt ∼ I(0) (2.47)
In the first case, yt ∼ I(1) with drift, which generates a direction or trend to
the random walk, whereas in the second case, yt ∼ I(0) about a deterministic
F and R FB are
trend. The key result as far as the RUR test is concerned is that RUR UR
divergent under both processes, so that:
F
RUR = Op (T1/2 ) → ∞ as T → ∞
F is now
under both (2.46) and (2.47). Thus, the right tail of the distribution of RUR
relevant for hypothesis testing, where the null hypothesis is of a driftless random
walk against a trended alternative of the form (2.46) or (2.47); for appropriate
upper quantiles, see Table 2.10 for RUR F and Table 2.11 for R FB .
UR
F
As the divergence of RUR follows for both the I(1) and trended I(0) alternative,
it is necessary to distinguish these cases by a further test. The following proce-
dure was suggested by AES (op. cit.). The basis of the supplementary test is that
if yt ∼ I(1) then yt ∼ I(0), whereas if yt ∼ I(0) then yt ∼ I(–1), so the test rests
on distinguishing the order of integration of yt .
If a variable xt is I(0), then its infinite lagged sum is I(1), so summation raises
the order of integration by one each time the operator is applied. The same
result holds if a series is defined as the infinite lagged sum of the even-order
(k)
lags. The sequence x̃t is defined as follows:
(k)
∞ (k−1)
x̃t ≡ Lj xt−j (2.48)
j=0
(0)
where the identity operator is x̃t ≡ xt .
Some of the possibilities of interest for determining the integration order are:
(1)
A: if xt ∼ I(0), then x̃t ≡ ∞ j=0 L xt−j ∼ I(1).
j
(1)
B.i: if xt ∼ I(–1), then x̃t ≡ ∞ j=0 L xt−j ∼ I(0).
j
(1) (2) ∞ j (1)
B.ii: if x̃t ∼ I(0), then x̃t ≡ j=0 L x̃t−j ∼ I(1).
These results may now be applied in the context that xt = yt , giving rise to
two possibilities to enable a distinction between a drifted random walk from a
trend stationary process:
Possibility A:
(1) F (or R FB ) with the variable x̃(1) , should
If xt ∼ I(0), then x̃t ∼ I(1), and using RUR UR t
result in non-rejection of the null hypothesis.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Possibility B:
(1) F (or R FB ) with the variable x̃(1) ,
If xt ∼ I(–1), then x̃t ∼ I(0), and using RUR UR t
should result in rejection of the null hypothesis.
(2) F (or R FB ) with the variable x̃(2) ,
If xt ∼ I(–1), then x̃t ∼ I(1), and using RUR UR t
should result in non-rejection of the null hypothesis.
Hence, given that in the case of the trending alternative, a rejection of the null
hypothesis has first occurred using the upper quantiles of the null distribution,
then the decision is to conclude in favour of a drifted random walk if possibility
A results, but conclude in favour of a trend stationary process if possibility B
results. Since the overall procedure will involve not less than two tests, each at,
say, an α% significance level, the overall significance level will (generally) exceed
α%; for example in the case of two tests, the upper limit on the size of the tests
is α = (1 − (1 − α)2 ). Note that the infinite sums implied in the computation of
(j)
x̃t have to be truncated to start at the beginning of the sample. An example of
the use of these tests is provided in Section 2.6.3.ii as part of a comparison of a
number of tests in Section 2.6.
Another basis for unit root tests that uses less than a full parametric structure
is the different behaviour of the variance of the series under the I(1) and I(0)
possibilities. Variance ratio tests are based on the characteristic of the partial sum
(unit root) process with iid errors that the variance increases linearly with time.
Consider the simplest case where yt = y0 + ti=1 εi , with εt ∼ iid(0, σε2 ) and, for
simplicity, assume that y0 is nonstochastic, then var(yt ) = tσε2 . Further, note that
1 yt ≡ yt − yt−1 = εt , where the subscript on 1 emphasises that this is the first
difference operator and higher-order differences are indicated by the subscript.
Also var(1 yt ) = σε2 and 2 yt ≡ yt − yt−2 ≡ 1 yt +1 yt−1 ; hence, by the iid
assumption, var(2 yt ) = 2σε2 . Generalising in an obvious way, var(q yt ) = qσε2 ,
q−1
where q yt ≡ yt − yt−q ≡ i=0 1 yt−i . In words, the variance of the q-th order
difference is q times the variance of the first-order difference.
Moreover, this result generalises to heteroscedastic variances provided {εt }
remains a serially uncorrelated sequence. For example, introduce a time sub-
2 , to indicate heteroscedasticity, then var( y ) = q−1 σ2
script on σε2 , say σε,t q t i=0 ε,t−i ,
so that the variance of the q-th order difference is the sum of the variances of
the q first-order differences.
For simplicity consider the homoscedastic case, then a test of whether yt has
been generated by the partial sum process with iid errors, can be based on a
comparison of the ratio of (1/q) times the variance of q yt to the variance of
1 yt . This variance ratio is:
1 var(q yt )
VR(q) ≡ (2.49)
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
q var(1 yt )
The quantity VR(q) can be calculated for different values of q and, in each case, it
will be unity under the null hypothesis, H0 : yt = y0 + ti=1 εi , with εt ∼ iid(0, σε2 ).
Hence, the null incorporates, or is conditional on, a unit root, with the focus on
the lack of serial correlation of the increments, εt . The nature of the alternative
hypothesis, therefore, requires some consideration; for example, rejection of H0
that VR(q) = 1 could occur because the εt are serially correlated, which is not
a rejection of the unit root in H0 ; however, rejection could also occur if there
is not a unit root. Values of VR(q) that are different from unity will indicate
rejection of the null hypothesis of a serially uncorrelated random walk.
This procedure generalises if there are deterministic components in the gen-
eration of yt . For example, assume that (yt − μt ) = ut , where E(ut ) = 0 and μt
includes deterministic components, typically a constant and/or a (linear) trend;
and, as usual, the detrended series is ỹt = yt − μ̂t , where μ̂t is an estimate of the
trend, then the variance ratio is based on ỹt rather than the original series, yt .
The notation reflects the overlapping nature of the data uses in this estimator –
this is discussed further below. The estimator OVR(q) allows for drift in the null
of a random walk as the variable in the numerator is the q-th difference, q yt ,
minus an estimator of its mean, qβ̂1 . An alternative notation is sometimes used:
q−1
let xt ≡ 1 yt , then q yt ≡ j=0 xt−j and this term is used in OVR(q). Also, the
quantity in (2.50) is sometimes defined using a divisor that makes a correction
for degrees of freedom in estimating the variance; see Lo and MacKinlay (1989).
A test based on OVR(q) uses overlapping data in the sense that successive
elements have a common term or terms; for example, for q = 2, 2 yt =
1 yt + 1 yt−1 and 2 yt−1 = 1 yt−1 + 1 yt−2 , so the ‘overlapping’ element is
1 yt−1 . To put this in context, suppose 1 yt are daily returns on a stock price,
then assuming five working days, a weekly return can be defined as the sum of
five consecutive daily returns; thus, for example, 5 yt = 4j=0 1 yt−j , with suc-
cessive element 5 yt+1 = 4j=0 1 yt+1−j , so that the overlapping elements are
3
j=0 1 yt−j . This differs from the weekly return defined as the sum of the five
daily returns in each week, the time series for which would be non-overlapping.
A non-overlapping version of VR(q), NOVR, is also possible, although the over-
lapping version is likely to have higher power; see Lo and MacKinlay (1989) and
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
also Tian, Zhang and Huan (1999) for the exact finite-sample distribution of the
NOVR version.
Lo and MacKinlay (1988) showed that:
√
2q(q − 1)(q − 1)
T(OVR − 1) ⇒D N 0, (2.51)
3q
√ 1/2
3q
T(OVR − 1) ⇒D N(0, 1) (2.52)
2q(q − 1)(q − 1)
the estimator; thus E(ψ̂ − b(ψ)) = ψ. To this end, OVR(q) can be written as a
function of the sample autocorrelations ρ̂(q) (see Cochrane (1988)), that is:
q−1
j
OVR(q) = 1 + 2 1− ρ̂(j) (2.53)
q
j=1
The expected value of ρ̂(j) for a serially uncorrelated time series, ignoring terms
of higher order than O( T−1 ), is given by Kendall (1954):
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
1
E(ρ̂(j)) = − (2.54)
T−j
This expectation should be zero for an unbiased estimator, hence (2.54) is also
the first-order bias of ρ̂(j). The bias is negative in finite sample, but disappears
as T → ∞, for fixed j. Thus, ρ̂(j) + (T − j)−1 , is a first order-unbiased estimator of
ρ̂(j) under the null of serially uncorrelated increments.
Noting that as E(OVR(q)) is a linear function of E{ρ̂(j)}, then the bias in
estimating OVR(q) by (2.50) is:
where:
q−1
2 q−j
B(OVR(q)) = − (2.56)
q T−j
j=1
See also Shiveley (2002). An estimator that is unbiased to order O(T−1 ) under the
null is obtained as ROVR(q) = OVR(q) – B(OVR(q)). However, the bias in (2.54)
is only relevant under the null, so that a more general approach would correct
for bias in estimating ρ(j) whether under the null or the alternative hypotheses.
The second point is that if q is large relative to T, then the standard normal
distribution does not approximate the finite sample distribution very well, with
consequent problems for size and power; see Richardson and Stock (1989) and
Deo and Richardson (2003). An alternative asymptotic analysis with q → ∞
as T → ∞, such that (q/T) → λ, provides a better indication of the relevant
distribution from which to obtain quantiles for testing H0 . Deo and Richardson
(2003) showed that under this alternative framework, OVR(q) converges to the
following non-normal limiting distribution:
1 1
OVR(q) ⇒ (B(s) − B(s − λ) − λB(1))2 ds (2.57)
(1 − λ)2 λ λ
for which critical values can be obtained by simulation; see Deo and Richardson
(2003).
The third point is that variance ratio tests are often carried out for different
values of q; this is because under the null the variance ratio is unity for all
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
cations of the variance ratio test and it is, therefore, possible to control for the
overall size.
Fourth, the random walk hypothesis is often of interest in situations where
homoscedasticity is unlikely; for example, there is a considerable body of evi-
dence that financial time series are characterised by heteroscedasticity, including
ARCH/GARCH-type conditional heteroscedasticity. The variance ratio test can
be generalised for this and some other forms of heteroscedasticity, which again
results in a test statistic that is distributed as standard normal under the null
hypothesis; see Lo and MacKinlay (1989).
A problem with variance ratio-based tests is that they are difficult to generalise
to non-iid errors; however, Breitung (2002) has suggested a test on a variance
ratio principle, outlined in the next section, that is invariant to the short-run
dynamics and is robust to a number of departures from normally distributed
innovations.
T
Yt2 /T2
νK = t=1
T
(2.58)
t=1 yt /T
2
where Yt ≡ ti=1 yi ; see also UR, Vol. 1, chapter 11. The denominator is an
estimator of the long-run variance assuming no serial correlation; if this is not
the case, it is replaced by an estimator, either a semi-parametric estimator, with
a kernel function, such as the Newey-West estimator, or a parametric version
that uses a ‘long’ autoregression (see UR, Vol. 1, chapter 6).
The statistic νK is the ratio of the variance of the partial sum to the variance
of yt normalised by T. If the null hypothesis is of stationarity about a non-zero
constant or linear trend, then the KPSS test statistic is defined in terms of ỹt , so
T
Ỹt2
= T−1 t=1
T
(2.59)
2
t=1 ỹt
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
and Ỹt is the partial sum of ỹt , that is Ỹt ≡ ti=1 ỹi .
When using νrat as a test of stationarity, the critical values come from the right
tail of the null distribution of νrat . However, Breitung (2002) suggests using νrat as
a unit root test, that is of the null hypothesis that yt is I(1) against the alternative
that yt is I(0), so that the appropriate critical values for testing now come from
the left tail of the null distribution.
Breitung (2002, Propostion 3) shows that νrat has the advantage that its asymp-
totic null distribution does not depend on the nuisance parameters, in this case
those governing the short-run dynamics. The limiting null distribution when
there are no deterministic components is:
1 r 2
0 0 B(s) dr
νrat ⇒ 1 (2.60)
2
0 B(r) dr
1
B(r)μ = B(r) − B(s)ds (2.62)
0
1 1
B(r)β = B(r) + (6r − 4) B(s)ds − (12r − 6) sB(s)ds (2.63)
0 0
100 0.0313 0.0215 0.0109 0.0144 0.0100 0.0055 0.0044 0.0034 0.0021
250 0.0293 0.0199 0.0097 0.0143 0.0100 0.0056 0.0044 0.0034 0.0022
500 0.0292 0.0199 0.0099 0.0147 0.0105 0.0054 0.0044 0.0036 0.0026
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Source: Breitung (2002, table 5); a sample value less than the α% critical value leads to rejection of H0 .
The models for which νrat,j is relevant are of the following familiar form:
ỹt = ρỹt−1 + zt
zt = ω(L)εt
ỹt = yt − μt
J J
where ω(L) = 1 + j=1 ωj Lj , with εt ∼ iid(0, σε2 ), σε2 < ∞, j=1 j2 ω2j < ∞, and J
may be infinite as in the case that the generation process of zt includes an AR
component. Error processes other than the standard linear one are allowed; for
example, zt could be generated by a fractional integrated process, or a non-
linear process, provided that they admit a similar Beveridge-Nelson type of
decomposition; see Breitung (2002) for details.
Breitung (2002) reports some simulation results for the problematic case with
an MA(1) error generating process, so that yt = zt , with zt = (1 + θ1 L)εt and
T = 200, where the test is based alternatively on the residuals from a regression
of yt on a constant or a constant and a trend. The results indicate a moderate
size bias for νrat,j , j = μ, β, when θ1 = −0.5, which increases when θ1 = −0.8;
whether this is better than the ADF for τ̂j , depends on the lag length used in the
latter, with ADF(4) suffering a worse size bias, but ADF(12) being better on size
but worse on power.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
distribution.
The simulation results are for the case where the DGP is: yt = ρyt−1 + zt , zt =
(1 + θ1 L)εt , ρ = (1, 0.95, 0.9, 0.85), θ1 = (0, –0.5) and T = 200; size is evaluated
when ρ = 1 for different values of θ1 , whereas power is evaluated for ρ < 1. The
nominal size of the tests is 5%, and the alternative innovation distributions are
N(0, 1), t(3) and Cauchy. The rank score-based test λlr,β requires estimation of
the long-run variance when θ1 = −0.5 (see Section 2.3.2, especially Equation
(2.40)). A truncation parameter of M = 8 was used, as in BG (op. cit.) and, for
comparability, a lag of 8 was used for the DF tests.
The results are summarised in Table 2.14 in two parts. The first set of results
is for θ1 = 0 and the second set for θ1 = –0.5. In the case of ρ < 1, both ‘raw’ and
size-adjusted power are reported, the latter is indicated by SA.
The first case to be considered is when the errors are normal and there is no
MA component, so there is no distinction between errors and innovations. Size
is generally maintained at its nominal level throughout. The standard DF tests
might be expected to dominate in terms of power; however, this advantage is not
uniform and the rank-based DF tests are comparable despite the fact that they
only use part of the information in the sample; for example, the rank-based
t-type test is more powerful than its parametric counterpart and is the most
powerful of the nonparametric tests considered here. An explanation for this
finding is that the mean of the ranks, r̄, is known, whereas in the parametric test
the mean has to be estimated, which uses up a degree of freedom; see Granger
and Hallman (1991) for further details. Of the other nonparametric tests RUR FB is
in second place. In the power evaluation, the rank-based score tests λr,β and λlr,β
are likely to be at a disadvantage because in this simulation set-up, where H0 is
a random walk without drift, they over-parameterise the alternative, and this is
confirmed in the simulation results. Their relative performance would improve
against δ̂r,β and τ̂r,β .
In the case of the innovations drawn from t(3), there are some slight size
distortions. Otherwise the ranking in terms of power is generally as in the normal
innovations case; of note is that νrat,μ gains somewhat in power whereas λr,μ
loses power. The Cauchy distribution is symmetric, but has no moments, and
may be considered an extreme case: in this case the DF rank-based tests δ̂r,μ
and τ̂r,μ become under-sized, whereas RUR F is over-sized and apart from R F ,
UR
Table 2.14 Summary of size and power for rank, range and variance ratio tests
θ1 = 0
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
εt ∼ t(3) δ̂μ τ̂μ δ̂r,μ τ̂r,μ F
RUR FB
RUR λr,β νrat,μ
ρ=1 4.10 5.20 3.90 3.85 5.95 3.60 4.95 4.85
ρ = 0.95 22.35 12.90 19.45 19.50 17.75 20.85 10.25 18.10
ρ = 0.90 56.30 37.50 47.20 45.15 31.55 32.90 16.85 32.95
ρ = 0.85 90.10 74.55 77.40 76.15 44.45 54.20 24.70 47.45
θ1 = −0.5
εt ∼ Normal δ̂μ τ̂μ δ̂r,μ τ̂r,μ F
RUR FB
RUR λlr,β νrat,μ
ρ=1 9.7 4.2 6.9 1.5 26.6 33.4 53.5 6.5
ρ = 0.95 59.8 24.8 48.2 16.9 76.2 91.8 86.4 40.1
SA 43.1 28.5 39.9 38.1 19.4 36.0 14.1 35.4
ρ = 0.90 93.3 61.1 84.5 51.2 92.0 99.5 93.8 69.1
SA 82.6 66.3 78.0 76.9 38.6 67.2 34.0 64.5
ρ = 0.85 99.1 84.7 95.7 78.0 95.8 1.0 93.6 85.8
SA 96.0 88.5 93.2 93.0 51.2 81.4 41.4 83.2
Notes: T = 200; the first row of each entry is the ‘raw’ size or power, and the second row is
the size-adjusted power when ρ < 1; best performers in bold.
power is clearly lost throughout. Overall, δ̂μ remains the best test; RUR F has size-
adjusted power that is better than δ̂μ for part of the parameter space, but since
it is over-sized the comparison is misleading; however, a bootstrapped version
that potentially corrects the incorrect size may have something to offer, but is
not pursued here.
A more problematic case for unit root testing is when there is an MA com-
ponent to the serial correlation, for example, in the first-order case when the
MA coefficient is negative, and θ1 = –0.5 is taken here as illustrative. This case
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
is known to cause difficulties to the parametric tests, for which some protection
is offered by extending the lag in an ADF model. A priori this may offer some
relative advantage to νrat , which is invariant to the serial correlation.
The first question, when ρ = 1 and θ1 = –0.5, relates to the retention of size,
where the nominal size is 5% in these simulations. In the case of normal inno-
vations, the best of the nonparametric tests is now νrat,μ , with RUR F and R FB
UR
suffering size distortions. The size distortion of λlr,μ is consistent with the results
reported in BG (op. cit.). Of the ADF rank tests, δ̂r,μ is slightly over-sized, whereas
τ̂r,μ is under-sized. The lag length is long enough for τ̂μ to maintain its size rea-
sonably well at 4.2%, but as observed in UR, Vol. 1, chapter 6, δ̂μ is not as robust
as τ̂μ . The comparisons of power will now only make sense for the tests that
do not suffer (gross) size distortions. Both ‘raw’ and size-adjusted (SA) power is
given for the cases where ρ < 1. In this case, it is now the simple nonparametric
test νrat,μ that has the advantage over the other tests.
The next question relates to the effect of non-normal innovations on the
performance of the various tests. When the innovations are drawn from t(3),
τ̂μ again maintains its size reasonably well at 4.0%, and δ̂r,μ and νrat,μ are next
best at 7.7%. The most powerful (in terms of SA power) of the tests is δ̂r,μ , with
honours fairly even between τ̂μ and νrat,μ depending on the value of ρ. With
Cauchy innovations, τ̂μ , δ̂r,μ and νrat,μ are comparable in terms of size retention,
whereas δ̂r,μ and νrat,μ are best in terms of SA power.
Overall, there is no single test that is best in all the variations considered
here. The rank-based tests are useful additions to the ‘stock’ of tests, but the test
least affected by the variations considered here is Breitung’s νrat , which shows
good size fidelity and reasonable power throughout, and is particularly simple
to calculate.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
yt δ̂μ δ̂β τ̂μ τ̂β λr,μ νrat,μ
σ2 = 0.5 90.3% 82.6% 83.9% 74.5% 6.2% 5.0%
σ2 = 1.0 95.9% 95.8% 94.8% 94.4% 6.4% 4.7%
Notes: T = 200; other details are as in Section 2.1.2 and Table 2.2; 5%
nominal size.
Case 1 where the unit root is in the levels of the series, but the tests are based
on the logs, and Case 2 where the unit root is the logs, but the tests are based
on the levels. The results, reported in Table 2.15, show that, unlike the DF tests,
λr,μ and νrat,μ maintain close to their nominal size of 5% in both cases.
correlated (see Section 2.4.3 and AES, op. cit., section 6), and here the sample
value is virtually identical to the 5% critical value. Of course, the use of multiple
tests implies that the overall significance level will generally exceed the nominal
significance level used for each test.
The usual predicament, anticipated in Section 2.2, therefore, arises in that the
tests are unable to discriminate between whether the unit root is in the ratio or
the log ratio, and a further test is required to assess this issue. This point was
considered in Section 2.2.3, where the KM test suggested that the unit root was
in the ratio (levels) if the comparison is limited to linear integrated or log-linear
integrated processes.
Table 2.16 Tests for a unit root: ratio of gold to silver prices
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
x 107
6.5 18
6 levels logs
17.8
5.5
5 17.6
4.5
17.4
4
3.5 17.2
3
17
2.5
2 16.8
1980 1985 1990 1995 2000 1980 1985 1990 1995 2000
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ln yt 0.240 –0.017 –0.090 –0.237 –0.162 0.0003
(‘t’) (–0.45) (–1.42) (–3.81) (–2.54) (1.98)
ln yt 0.294 –0.020 –0.113 –0.282 –0.213 –0.087 –0.185 –0.0004
(‘t’) (1.77) (–0.53) (–1.77) (–4.42) (–3.28) (–1.36) (–2.90) (0.75)
Notes: a β subscript indicates that the test statistic is based on a linear trend adjusted series; appropriate
critical values were simulated for the test statistics, based on 50,000 replications; Breitung’s λr,μ statistic
allows for drift, if present, under the null.
The parametric structure for the ADF tests is now somewhat more difficult
to pin down. In the case of the levels data, yt , both marginal-t and AIC select
ADF(5), with τ̂β = –2.853, which is greater than the 5% cv of –3.4, which leads
to non-rejection of the null; however, BIC suggests that no augmentation is
necessary and τ̂β = –4.729, which, in contrast, leads to rejection of the null
hypothesis. When the log of yt is used, BIC selects ADF(3), with τ̂β = –0.45,
which strongly suggests, non-rejection; AIC and marginal-t both select ADF(5),
with τ̂β = –0.53, confirming this conclusion. At this stage, based on these tests,
it appears that if there is a unit root then it is in the logarithm of the series; see
Table 2.17 for a summary of the regression results and Table 2.18 for the unit
root test statistics.
The next step is to compare the ADF test results with those for the nonpara-
metric tests. The rank ADF tests now use (linearly) detrended data, and both
δ̂r,β = −14.77 and τ̂r,β = −2.77 suggest non-rejection of the null. The RUR tests
now compare the sample value with the 95% quantile (that is, they become
right-sided tests), and both lead to rejection of a driftless random walk in favour
of a trended alternative. Also the rank-score test, λlr,μ , leads to non-rejection, as
does the variance ratio test νrat,β .
KM tests
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
There are two remaining issues. First, for the RUR tests, the question is whether
the rejection arises from a drifted random walk or a trend stationary process.
To assess this issue, the auxiliary procedure suggested in Section 2.4.4 is the
next step. This procedure involves a second application of the RUR test, but
this time on a variable that is based on a sum of the variable in the first test,
(1) F and
referred to in Section 2.4.4 as x̃t . The resulting sample values of the RUR
FB test statistics applied to x̃ (1)
Rur t are now 2.98 and 4.30 respectively; these values
strongly indicate non-rejection of the null hypothesis of a unit root as the 5%
quantiles are (approximately) 1.31 and 1.90 respectively; moreover, there is now
no rejection in the upper tail, as the 95% quantiles are (approximately) 3.29 and
4.65 respectively. In terms of the decisions outlined in Section 2.4.4, we find
in favour of possibility A, that is, the drifted integrated process rather than the
trend stationary process.
Second, non-rejection does not indicate the transformation of the series that
is appropriate. If the choice is limited to either a linear integrated or a log-linear
integrated process, the KM tests are relevant. In this case, the tests referred to
as V1 and V2 , which allow for a drifted integrated process, are reported in Table
2.19. The first of these leads to rejection of the null of linearity, whereas the
second suggests non-rejection of the null of log-linearity.
A practical problem for empirical research often relates to the precise choice
of the form of the variables to be used. This can be a key issue, even though
quite frequently there is no clear rationale for a particular choice; a leading
case, but of course by no means the only one, is whether a time series should
be modelled in levels (the ‘raw’ data) or transformed into logarithms. There
are very different implications for choosing a linear integrated process over a
log-linear integrated process, since the former relates to behaviour of a random
walk type whereas the other relates to an exponential random walk. A two-stage
procedure was suggested in this chapter as a way of addressing this problem. In
the first instance, it is necessary to establish that the null hypothesis of a unit
root is not rejected for some transformation of the data; this stage may conclude
that there is an integrated process generating the data, and the second stage
decision is then to assess whether this is consistent with a linear or log-linear
integrated process. Of course this just addresses two possible outcomes, linear
or log-linear; others transformations may also be candidates, although these are
the leading ones.
The second strand to this chapter was to extend the principle for the construc-
tion of unit root tests from variants of direct estimation of an AR model, as in
the DF and ADF tests, to include tests based on the rank of an observation in a
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
set and the range of a set of observations. Neither of these principles depended
on a particular parametric structure. Another framework for constructing a unit
root test without a complete parametric structure is based on the behaviour of
the variance of a series with a unit root. Variance-ratio tests are familiar from
the work of Lo and MacKinlay (1988), but the basic test is difficult to extend
for non-iid errors; however, a simple and remarkably robust test due to Breitung
(2002) overcomes this problem.
Other references of interest in this area include Delgado and Velasco (2005),
who suggested a test based on signs that is applicable to testing for a unit root
as well as other forms of nonstationarity. Hasan and Koenker (1997) suggested
a family of rank tests of the unit root hypothesis. Charles and Darné (1999)
provided an overview of variance ratio tests of the random walk. Wright (2000)
devised variance-ratio tests using ranks and signs. Tse, Ng and Zhang (2004) have
developed a non-overlapping version of the standard VR test. Nielsen (2009)
develops a simple nonparametric test that includes Breitung’s (2002) test as a
special case and has higher asymptotic local power.
Questions
Q2.2. Consider the ADF version of the rank-based test and explain how to obtain
the equivalent DF-type test statistics.
just:
γ̂r
τ̂R,μ = (A2.2)
σ̂(γ̂r )
where τ̂R,μ is the usual t-type test statistic; the extension of δ̂R,μ to the rank-ADF
case requires a correction, as in the usual ADF case, that is:
(ρ̂r − 1)
δ̂R,μ = T (A2.3)
(1 − ĉr (1))
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
where ĉr (1) = k−1 j=1 ĉr,j . Note that, without formal justification, a detrended
version of this test was used in the application to air revenue passenger miles in
Section 2.6.3.ii.
Q2.3. Develop a variance ratio test based on the signs; see for example, Wright
(2000).
A2.3. The signs define a simple transformation of the original series. If xt is a
random variable, then the sign function for xt is as follows:
Thus, if xt > 0, then I(xt > 0) = 1 and S(xt ) = 1, and if xt ≤ 0 then I(xt ≤ 0) = 0
and S(xt ) = −1. S(xt ) has zero expectation and unit variance.
Wright (2000) suggested a sign-based version of a variance-ratio test. Assume
that xt has a zero mean, then S(xt ) replaces xt in the definition of the variance
ratio, to give:
⎛ 2 ⎞
1 T q−1
s −1/2
⎜ Tq t=q+1 j=0 t−j ⎟ 2(2q − 1)(q − 1)
OVRsign = ⎝ − ⎠
1 T
1 (A2.5)
2 3qT
t=1 s
T t
where st−j ≡ S(xt−j ). This form of this test arises because the identity used in the
parametric form of the test does not hold with the sign function, that is q st =
q−1
j=0 st−j .
This test could be applied to the case where xt ≡ yt and the null model
is yt = σt εt , with assumptions that: (i) σt and εt are independent condi-
tional on the information set, It = {yt , yt−1, . . .}, and (ii) E(εt |It−1 ) = 0. The first
assumption allows heteroscedasticity, for example of the ARCH/GARCH form
or stochastic volatility type. Note that these assumptions do not require εt to
be iid.
There are variations on these tests. For example, the sign-based test is easily
extended if the null hypothesis is of a random walk with drift, and rank-based
versions of these test can be obtained by replacing the signs by ranks (see
Wright (op. cit.), who simulates the finite sample distributions and provides
some critical values for different T and q).
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Introduction
This chapter is the first of two that consider the case of fractional values of the
integration parameter. That is suppose a process generates a time series that
is integrated of order d, I(d), then the techniques in UR, Vol. 1, were wholly
concerned with the case where d is an integer, the minimum number of dif-
ferences necessary, when applied to the original series, to produce a series that
is stationary. What happens if we relax the assumption that d is an integer?
There has been much recent research on this topic, so that the approach in
these two chapters must necessarily be selective. Like so many developments in
the area of time series analysis one again finds influential original contributions
from Granger; two of note in this case are Granger (1980) on the aggregation of
‘micro’ time series into an aggregate time series with fractional I(d) properties,
and Granger and Joyeux (1980) on long-memory time series.
One central property of economic time series is that the effect of a shock is
often persistent. Indeed in the simplest random walk model, with or without
drift, a shock at time t is incorporated or integrated into the level of the series
for all periods thereafter. One can say that the shock is always ‘remembered’ in
the series or, alternatively, that the series has an infinite memory for the shock.
This infinite memory property holds for more general I(1) processes, with AR
or MA components. What will now be of interest is what happens when d is
fractional, for example d ∈ (0, 1.0). There are two general approaches to the
analysis of fractional I(d) process, which may be analysed either in the time
domain or in the frequency domain. This chapter is primarily concerned with
the former, whereas the next chapter is concerned with the latter.
This chapter progresses by first spending some time on the definition of a
fractionally integrated process, where it is shown to be operational through
the binomial expansion of the fractional difference operator. The binomial
expansion of (1 − L)d is an elementary but essential tool of analysis, as is its
76
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
not converge to a finite constant; in short, they are nonsummable. Of course,
this property also applies to the autocovariances as ρ(k) = γ(k)/γ(0). However,
an I(d) process is still stationary for d ∈ (0, 0.5), only becoming nonstationary
for d ≥ 0.5. Even though the ρ(k) are not summable for d ≥ 0.5, the coefficients
in the MA representation of the process do tend to zero provided d < 1 and
this is described as ‘nonpersistence’. Thus, a process could be nonstationary
but nonpersistent, and care has to be taken in partitioning the parameter space
according to these different properties.
Having established that we can attach a meaning to (1 − L)d yt , a question
to arise rather naturally is whether such processes are likely to occur in an eco-
nomic context. Two justifications are presented here. The first relates to the error
duration model, presented in the form due to Parke (1999), but with a longer his-
tory due, in part to the ‘micropulse’ literature, with important contributions by
Mandelbrot and his co-authors (see, for example, Cioczek-Georges and Mandel-
brot, 1995, 1996, and Mandelbrot and Van Ness, 1968). The other justification
is due to Granger (1980), who presented a model of micro relationships which
were not themselves fractionally integrated, but which on aggregation became
fractionally integrated.
We also consider the estimation of d in the time domain and, particularly,
hypothesis testing. One approach to hypothesis testing is to extend the DF
testing procedure to the fractional d case. This results in an approach that is
relatively easy to apply in a familiar framework.
The range of hypotheses of interest is rather wider than, but includes, the
unit root null hypothesis. In a fractional d context, the alternative to the unit
root null is not a stationary AR process, but a nonstationary fractional d process;
because, in this case, the alternative hypothesis is still one of nonstationarity,
another set-up of interest may be that of a stationary null against a nonstationary
alternative.
The sections in this chapter are arranged as follows. Section 3.1 is concerned
with the definition of a fractionally integrated process and some of its basic
properties. Section 3.2 considers the ARFIMA(p, d, q), where d can be fractional,
which is one of the leading long-memory models. Section 3.3 considers the
kind of models that can generate fractional d. Section 3.4 considers the situa-
tion in which a Dickey-Fuller type test is applied when the process is fractionally
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.1 A fractionally integrated process
One could add that (1−L)d−1 yt = vt , vt ∼ I(1), to indicate that d is the minimum
number of differences necessary for the definition to hold. A special case of (3.1)
is where ut = εt and {εt } is a white noise sequence with zero mean and constant
variance σε2 .
In a sense defined in the next section it is possible to relax the assumption
that d is an integer and interpret (3.1) accordingly either as an I(d) process for
yt or as a unit root process with fractional noise of order I(d − 1).
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
consider the binomial expansion of the fractional differencing operator (1 − L)d
and, in particular, the AR and MA coefficients associated with this operator.
where:
(d) (d)
A0 ≡ 1, Ar ≡ (−1)r d Cr (3.4)
(d)r d!
d Cr ≡ = (3.5a)
r! r!(d − r)!
d C0 ≡ 1, 0! ≡ 1
(d)r = d(d − 1)(d − 2) . . . (d − (r − 1))
= d!/(d − r)! (3.5b)
In the integer case, the binomial coefficient d Cr is the number of ways of choos-
ing r from d without regard to order. However, in the fractional case, d is not an
integer and this interpretation is not sustained, although d Cr is defined in the
same way.
Applying the operator (1 − L)d to yt results in:
∞
(1 − L)d yt = yt + Cr (−1)r yt−r (3.6)
r=1 d
= A(d) (L)yt
∞ (d)
A(d) (L) ≡ Ar Lr (3.7)
r=0
A(d) (L) is the AR polynomial associated with (1 − L)d . Using the AR polynomial,
the model of Equation (3.1) with fractional d can be represented as the following
infinite autoregression:
∞
r
yt = (−1) d Cr (−1) yt−r + ut (3.8a)
r=1
∞ (d)
= (−1) A yt−r + ut (3.8b)
r=1 r
∞ (d) (d) (d)
= r yt−r + ut where r ≡ −Ar (3.8c)
r=1
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.1.2.ii MA coefficients
Also of interest is the MA representation of Equation (3.1), that is operating on
the left-hand-side of (3.1) with (1 − L)−d , gives:
yt = (1 − L)−d ut (3.9)
where:
(d) (d)
B0 = 1, Br ≡ (−1)r −d Cr (3.11)
= B(d) (L)ut
∞ (d)
B(d) (L) ≡ Br Lr (3.13)
r=0
There are recursions for both the AR and the MA coefficients (for details see
Appendix 3.1 at the end of this chapter) respectively, as follows for r ≥ 1:
(d) (d)
Ar = [(r − d − 1)/r]Ar−1 (3.14)
(d) (d)
Br = [(r + d − 1)/r]Br−1 (3.15)
The notation k is used for the general case and n for an integer. The Gamma
function interpolates the factorials between integers, so that for integer n ≥ 0
then (n + 1) = n!. The property (n + 1)! = (n + 1)n!, which holds for integer
n ≥ 0, also holds for the Gamma function without the integer restriction; this
is written as (n + 2) = (n + 1)(n + 1).
In the notation of this chapter, the binomial coefficients defined in (3.5) are:
(d)r
d Cr =
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
r!
(d + 1)
= (3.17)
(r + 1)(d − r + 1)
The autoregressive and moving average coefficients are given, respectively, by:
(d) (r − d)
Ar = (3.18)
(r + 1)(−d)
(d) (r + d)
Br = (3.19)
(r + 1)(d)
Using Stirling’s theorem, it can be shown that (r + h)/ (r + 1) = O(rh−1 ) and
applying this for h = −d and h = d, respectively, then as r → ∞, we have:
(d) 1
Ar ∼ r−d−1 AR coefficients (3.20)
(−d)
(d) 1 d−1
Br ∼ r MA coefficients (3.21)
(d)
The AR coefficients will not (eventually) decline unless d > − 1 and the MA
coefficients will not (eventually) decline unless d < 1. Comparison of the speed of
decline is interesting. For 0 < d < 1, the AR coefficients (asymptotically) decline
faster than the MA coefficients, whereas for d < 0, the opposite is the case. For
(d) (d)
given d, the relative rate of decline of Ar /Br is governed by r−2d , which can
be substantial for large r and even quite modest values of d. For example, for
d = 0.5 the relative decline is of the order of r−1 , whereas for d = −0.5 it is the
other way around. This observation explains the visual pattern in the figures
reported below in Section 3.1.5.
(d)
Notice that the MA coefficients Br for given d are the same as the AR coef-
(d) (d̄)
ficients Ar for −d and vice versa. To see this note that for d = d̄, Ar =
(d̄) (−d̄)
(−1)r d̄ Cr and Br = (−1)r −d̄ Cr whereas for d = −d̄, Ar = (−1)r −d̄ Cr and
(−d̄)
Br = (−1)r d̄ Cr .
The MA coefficients trace out the impulse response function for a unit shock
in εt . A point to note is that even though the sequence {yt } is nonstationary for
d ∈ [½, 1), the MA coefficients eventually decline. This property is often referred
to as mean reversion, but since the variance of yt is not finite for d ≥ 0.5, this term
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
There are two ways of approaching the definition of a fractional I(d) process,
referred to as partial summation and the direct approach (or definition), and
these are considered in the next two sections.
The difference from the standard unit root process is that the zt are generated
by an I(d − 1) process given by:
(1 − L)d−1 zt = ut
⇒
zt = (1 − L)1−d ut
∞ (d−1)
= Br ut−r for d ∈ [1/2, 3/2) (3.25)
r=0
(d−1)
The increments zt are I(d − 1), with (d − 1) < ½, and the MA coefficients Br
are defined in (3.19). Also notice that if d ∈ (½, 1), then d − 1 ∈ (−½, 0), so that zt ,
on this definition, exhibits negative autocorrelation and is referred to as being
anti-persistent.
Noting that (1 − L)−1 is the summation operator and zt = 0 for t < 1, then
(3.26) can be rewritten as follows:
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(1 − L)d (yt − y0 ) = ut multiply through by (1 − L)d−1 , t≥1 (3.28)
where:
Note that the partial summation approach nests the unit root framework that is
associated with integer values of d. For purposes of tying in this approach with
the corresponding fractional Brownian motion (fBM) this will be referred to as
a type I fractional process.
Of particular interest for the direct definition of an I(d) process is the MA form
truncated to reflect the start of the process t = 1, that is:
yt − y0 = (1 − L)−d ut t≥1
t−1 (d)
= Br ut−r (3.32)
r=0
This is a well-defined expression for all values of d, not just d ∈ (−½, ½).
The direct summation approach leads to what will be referred to as a type II
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
fractional process. Where it is necessary to distinguish between yt gener-
(I) (II)
ated according to the different definitions they will be denoted yt and yt ,
respectively.
where the term ξ0 (d) captures the presample values of ut ; this term is of the
(II)
same stochastic order as yt for d ∈ [1/2, 3/2), which means that it cannot be
ignored asymptotically, so that the initialisation, or how the pre-sample values
are treated, does matter.
where r, s ∈ , | d | < ½ and W(s) is regular Brownian motion (BM) with variance
2 , which is a standard BM if σ2 = 1, in which case it is usually referred to as B(s).
σlr lr
The unit root case is worth recalling briefly. Consider the simple random walk
yt = tj=1 zj , t = 1, . . . , T, which is an example of a partial sum process based on
the ‘increments’ zt ∼ I(0), which may therefore exhibit weak dependence and,
for simplicity, the initial value y0 has been set to zero. This can be written
equivalently by introducing the indexing parameter r:
[rT]
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
yT (r) = zt (3.36)
t=1
The notation [rT] indicates the integer part of rT; thus, rT is exactly an integer
for r = j/T, j = 1, . . . , T and, for these values, gives the random walk over the
integers. However, yT (r) can be considered as continuous function of r, albeit it
will be a step function, with the ‘steps’ becoming increasingly smaller as T → ∞,
so that, in the limit, yT (r) is a continuous function of r. To consider this limit,
√
yT (r) is first normalised by σlr,z T, where σlr,z2 < ∞ is the ‘long-run’ variance of
zt , so that:
yT (r)
YT (r) ≡ √ (3.37)
σlr,z T
(I)
[rT]
yt (r) = zt where zt ∼ I(d), d < | 1/2 | , d = 0
t=1
(I)
and yt (r) is then normalised as follows:
(I)
yT (r)
Y(I) (r) = 1
(3.39)
σlr,z T /2+d
where Y(I) is defined in (3.34). An analogous result holds for a type II process:
(II)
yT (r)
Y(II) (r) = 1
σlr,z T /2+d
⇒D Y(II) (3.41)
The impact of the different forms of fBM on some frequently occurring esti-
mators is explored by Robinson (2005). (For example, if the properties of an
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
estimator of, say, d are derived assuming type I fBM, what are its properties if
type II fBM is assumed?)
(1 − L)d
yt = ut (3.43)
y t ≡ yt − µt
(1 − L)d
yt = εt 1(t>0) t = 1, . . . , T (3.44)
(1 − L)d
yt = εt t = −∞, . . . , 0, 1, . . . , T (3.45)
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
As noted these two representations give rise to type I and type II fractional
Brownian motion (fBM) where the latter is associated with the truncated process
and the former with the non-truncated process; see Marinucci and Robinson
(1999), and Davidson and Hashimzade (2009), and see Section 3.1.5.
Recall that to write t = 1 is a matter of notational convenience; in practice,
the sample starting date will be a particular calendar date, for example, 1964q1,
which raises the question, for actual economic series, of whether it is reasonable
to set all shocks before the particular start date at hand to zero. As Davidson
and Hashimzade (2009) note, whilst in time series modelling such a truncation
often does not matter, at least asymptotically, in this case it does. This issue is
of importance in, for example, obtaining the critical values for a particular test
statistic that is a functional of type I fBM, then it would be natural enough to
start the process at some specific date in a simulation setting, but type II fBM
may be a better approximation for ‘real world’ data. See Robinson (2005) for a
discussion of some of the issues.
(1 − L)yt = ut (3.47a)
(1 − L)d ut = w(L)εt (3.47b)
Provided that the roots of φ(L) are outside the unit circle (see UR, Vol. 1,
chapter 2), the ARFIMA(p, d, q) model can be inverted to obtain the moving
average representation given by:
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(The polynomials in (3.46) are scalar so the rearrangement of their order in
(3.48) is valid.) The moving average polynomial is ω(L) = φ(L)−1 θ(L)(1 − L)−d ,
which is now the convolution of three polynomials.
(−1)k (−2d)!
γ(k) = σ2 (3.49a)
(k − d)!(−k − d)! ε
(−1)k (1 − 2d)
= σ2 (3.49b)
(k − d + 1)(1 − k − d) ε
The variance is obtained by setting k = 0 in (3.49a) or (3.49b):
(−2d)! 2
γ(0) = σ (3.50)
[(−d)!]2 ε
Note that the variance, γ(0), is not finite for d ≥ 0.5. Throughout it has been
assumed that µt = 0, otherwise yt should be replaced by yt − µt .
3.2.1.ii Autocorrelations
The lag k autocorrelations, ρ(k) ≡ γ(k)/γ(0), are given by:
(−d)!(k + d − 1)!
ρ(k) = k = 0, ±1, . . . (3.51a)
(d − 1)!(k − d)!
d(1 + d) . . . (k − 1 + d)
= (3.51b)
(1 − d)(2 − d) . . . (k − d)
(1 − d)(k + d)
= (3.51c)
(d)(k + 1 − d)
(k+d)
As k → +∞, then (k+1−d)
→ k (d)−(1−d) = k 2d−1 , therefore:
(1 − d) 2d−1
ρ(k) → k (3.52)
(d)
As γ(k) and ρ(k) are related by a constant (the variance), they have the same
order, namely Op (k2d−1 ).
The first-order autocorrelation coefficient for d < 0.5 is obtained by setting
k = 1 in (3.51a), that is:
(−d)!(d!)
ρ(1) =
(d − 1)!(1 − d)!
= d/(1 − d) (3.53)
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Note that ρ(1) → 1 as d → 0.5. Application of (3.53) for d > 0.5 results in ρ(k) > 1,
which is clearly invalid for an autocorrelation.
From (3.52), the behaviour of the sum of the autocorrelations will be deter-
mined by the term k 2d−1 . Clearly, such a sum will not be finite if d ≥ 0.5
because the exponent on k will be greater than 0, but is d < 0.5 a sufficient
condition? The answer is no. The condition for its convergence can be estab-
lished by the p-series convergence test (a special case of the integral test); write
k 2d−1 = 1/k−(2d−1) , then convergence requires that −(2d − 1) > 1, that is d < 0.
(The case d = 0 is covered trivially as all ρ(k) apart from ρ(0) are zero.) It is this
nonsummability aspect that is sometimes taken as the defining feature of ‘long
memory’.
(d + 1) −2d−1
ρ(k)inv → |k| (3.55)
(−d)
Note that the inverse autocorrelations are obtained by replacing d by −d in the
standard autocorrelations (hence ρ(k|d)inv = ρ(k| − d)).
0.8 d = 0.45
–0.1 d = –0.3
0.6
–0.2
0.4
d = 0.45 d = –0.45
0.2 –0.3
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0
lag k 0 50 100 150 lag k 0 5 10 15 20
80 d = 0.45 –0.2
60
–0.3 d = –0.3
40
d = 0.3 d = –0.45
–0.4
20
lag k 0 0 –0.5
50 100 150 lag k 0 50 100 150
0.4 –0.1
8 d = 0.45
d = –0.45
6
d = 0.3 –0.6
4 d = –0.3
–0.8
2
0 –1
lag k 0 50 100 150 lag k 0 50 100 150
–0.1 0.4
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
–0.5 0
lag k 0 5 10 15 20 lag k 0 50 100 150
8 d = –0.45
–0.4
d = 0.45 d = –0.3
6
–0.6
d = 0.3
4
–0.8
2
–1 0
lag k 0 50 100 150 lag k 0 50 100 150
The slow decline and non-summability is more evident as d moves toward the
nonstationary boundary of 0.5. The autocorrelations for negative d are slight,
decline quickly and are summable.
The corresponding MA and AR coefficients and their respective sums are
shown in Figures 3.2a–3.2d, and Figures 3.3a–3.3d, respectively. These are plots
(d) (d)
of the Br coefficients from (3.15) and the Ar coefficients from (3.14), respec-
tively. Again it is evident from the positive and declining MA coefficients in
Figure 3.2a that positive values of d are more likely to characterize economic
times series than negative values of d, which, as Figure 3.2b shows, generate
negative MA coefficients. Note also that the AR coefficients, and their respec-
tive sums, in Figures 3.3a to 3.3d are simply the negative of the corresponding
MA coefficients in Figures 3.2a to 3.2d.
To give a feel of what time series generated by fractional d look like, simulated
time series plots are shown in Figures 3.4a to 3.4d. In these illustrations d = −0.3
(to illustrate ‘anti-persistence’), then d = 0.3 and d = 0.45 each illustrates some
persistence, and d = 0.9 illustrates the nonstationary case. In addition, to mimic
economic time series somewhat further, some serial correlation is introduced
into the generating process; in particular, the model is now ARFIMA(1, d = 0.9,
0) with φ1 = 0.3, 0.5, 0.7 and 0.9 in Figures 3.5a to 3.5d respectively. As the degree
of serial correlation increase, the series becomes ‘smoother’ and characterises
well the typical features of economic time series.
2 2
0 0
–2 –2
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
–4 –4
0 100 200 300 400 500 0 100 200 300 400 500
2
10
0
5
–2
0
–4
–6 –5
0 100 200 300 400 500 0 100 200 300 400 500
20
20
10
10
0
0
–10
–10 –20
0 100 200 300 400 500 0 100 200 300 400 500
200
60
150
40
100
20 as φ1 increases
50 the series
becomes 'smoother'
0 0
0 100 200 300 400 500 0 100 200 300 400 500
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
The essence of the error duration model, EDM, is that there is a process gen-
erating stochastic shocks in a particular economic activity that have a lifetime,
or duration, which is itself stochastic, and that the generic time series variable
yt is the sum of those shocks surviving to period t. The probability of a shock
originating in period s surviving to period t, referred to as the survival prob-
ability, is critical in determining whether the process is one with inherently
long memory (defined as having nonsummable autocovariances). Parke (op.
cit.) gives two examples of the EDM. In the first, shocks to aggregate employ-
ment originate from the birth and death of firms; if a small proportion of firms
have a very long life, then long memory is generated in aggregate employment.
The second example concerns financial asset positions: once an asset position is
taken, how long will it last? This is another example of a survival probability; if
some positions, once taken, are held on a long term basis, there is the potential
for a process, for example that of generating volatility of returns, to gener-
ate long memory. This line of thought generates a conceptualisation in which
events generate shocks, for example productivity shocks, oil price shocks, stock
market crashes, which in turn affect and potentially impart long memory into
economic activity variables, such as output and employment. Equally, one can
regard innovations as ‘shocks’ in this sense; for example, mobile phone technol-
ogy, plasma television and computer screens, and then consider whether such
shocks have a lasting effect on consumer expenditure patterns.
shock has a survival period; it originates in period s and survives until period
s + ns . The duration or survival period of the shock ns − (s − 1) is a random
variable. Define a function gs,t that indicates whether a shock originating in
period s survives to period t; thus, gs,t = 0 for t > s + ns , and gs,t = 1 for s ≤ t
< s + ns ; the function is, therefore, a form of indicator function. The shock
and its probability of survival are assumed to be independent. The probability
that the shock survives until period s +k is pk = p(gs,s+k = 1), which defines
a sequence of probabilities {pk } for k ≥ 0. For k = 0, corresponding to gs,s = 1,
assume p0 = 1, so that a shock lasts its period of inception; also, assume that the
sequence of probabilities {pk } is monotone non-increasing. The {pk } are usually
referred to as the survival probabilities. A limiting case is when the process has
no memory, so that pk = p(gs,s+k = 1) = 0 for k ≥ 1. Otherwise, the process has
a memory characterised by the sequence {pk }, and the interesting question is
the persistence of this memory, and does it qualify, in the sense of the McLeod-
Hippel (1978) definition, to be described as long memory.
Finally, the realisation of the process at time is yt , which is the sum of all the
errors that survive until period t; that is:
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
t
yt = gs,t ξs (3.56)
s=−∞
γ(1) = p1 + p2 + p3 + · · ·
γ(2) = p2 + p3 + p4 + · · ·
..
.
γ(k) = pk + pk+1 + pk+1 + · · · (3.57)
The pattern is easy to see from these expressions, and the sum of the autoco-
variances, for k = 1, . . . , n, is:
n
γ(k) = p1 + 2p2 + 3p3 · · ·
k=1
n
= kpk (3.58)
k=1
Hence, the question of long memory concerns whether n k=1 kpk is summable.
(Omitting γ(0) from the sum does not affect the condition for convergence.) The
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
survival probabilities are assumed not to increase; let them be characterised as:
pk = ck α (3.59)
where c is positive constant and α < 0, such that 0 ≤ ck α ≤ 1; α > 0 is ruled out
as otherwise pk > 1 is possible for k large enough. (All that is actually required is
that the expression for pk (k) in (3.59) holds in the limit for n → ∞.)
α (1+α) . The
The individual terms in n j=1 kpk are given by kpk = k(ck ) = ck
series to be checked for convergence is, therefore:
n
γ(k) = c + 2c(2α ) + 3c(3α ) + kck α + · · ·
k=1
This outline shows that the error duration model can generate a series yt that is
integrated of order d, that is I(d). Hence, applying the d-th difference operator
(1 − L)d to yt reduces the series to stationarity; that is (1 − L)d yt = εt , where
εt is iid.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
pk = (3.61)
(k + 2 − d)(d)
with the following recursion starting from p0 = 1:
k+d
pk+1 = pk (3.62)
k+2−d
Note that the pk in (3.61) satisfy (3.59) as a limiting property. The recursion
shows that the ratio of successive probabilities pk+1 /pk , referred to as the condi-
tional survival probabilities, tends to one as k increases; thus, there is a tendency
for shocks that survive a long time to continue to survive. Figure 3.6 plots the
survival probabilities and the conditional survival probabilities for d = 0.15 and
d = 0.45. The relatively slow decline in the survival probabilities is more evident
in the latter case; in both cases pk+1 /pk → 1 as k → ∞.
0.35 1
Survival probabilities
0.3
0.9
d = 0.45
0.25
0.8
d = 0.15
0.2
0.7
0.15 d = 0.45
0.6
0.1
d = 0.15 Conditional survival
0.5 probabilities → 1 as k → ∞
0.05
0 0.4
5 10 15 20 25 30 10 20 30 40 50
k k
Table 3.1 Empirical survival rates Sk and conditional survival rates Sk /Sk−1 for US firms
Year k ⇒ 1 2 3 4 5 6 7 8 9 10
Sk 0.812 0.652 0.538 0.461 0.401 0.357 0.322 0.292 0.266 0.246
Sk /Sk−1 0.812 0.803 0.826 0.857 0.868 0.891 0.902 0.908 0.911 0.923
Source: Parke (1999); based on Nucci’s (1999) data, which is from 5,727,985 active businesses in the
1987 US Bureau of the Census Statistical Establishment List.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.3.1.v The survival probabilities in a short-memory, AR(1), process
It is useful to highlight the difference between the survival probabilities for a
long-memory process and those for a short-memory process, taking the AR(1)
model as an example of the latter. Consider the AR(1) model (1 − φ1 L)yt = ηt
where ηt is iid and 0 ≤ φ1 < 1. The autocorrelation function is ρ(k) = φk1 , with
recursion ρ(k +1) = φ1 ρ(k). This matches the case where the conditional survival
probabilities are constant, that is pk+1 /pk = φ1 starting with p0 = 1; thus, p2 =
φ1 p1 , p3 = φ1 p2 and so on. Hence, the survival probability pk is just the k-th
autocorrelation, ρ(k). The AR(1) model with φ1 = 0.5 and the I(d) model with
d = 1/3 have the same first order autocorrelation of 0.5, but the higher-order
autocorrelations decline much more slowly for the I(d) model.
Hence, solving for α and then d by, taking logs and rearranging, we obtain:
α
k
ln(pk /pk+j ) = ln (3.64)
(k + j)
ln(pk /pk+j )
⇒ α= (3.65)
ln[k/(k + j)]
ln(pk /pk+j )
⇒ d = 1 + 0.5 (3.66)
ln[k/(k + j)]
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Given data, as in Table 3.1, an estimate, d̂, can be obtained by choosing k and j;
for example, k = 5 and j = 5 gives d̂ = 0.65, indicating long memory and non-
stationarity. With an estimate of d, an estimate of the scaling factor c can be
obtained from c = k α /pk for a choice of k. For d̂ = 0.65 and k = 10, Parke obtains
c = 1.25 by ensuring that p10 = S10 , so the estimated model is:
pk = 1.25k −2+2(0.65)
(3.67)
This estimated function is graphed in Figure 3.7, and fits the actual survival rates
particularly well from S5 onward, but not quite as well earlier in the sequence;
and also note that p1 = 1.25(1)−2+2(0.65) = 1.25 > 0.
An alternative to using just one choice of k and j, which might reduce the
problem of the estimated probabilities above one, is to average over all possible
1.6
1.4
1.2
Estimated probability
1
estimated using data for S5 and S10
0.8
estimated using all data
0.6
actual
0.4
0.2
1 2 3 4 5 6 7 8 9 10
Survival duration (years)
choices. There are 9 combinations for j = 1, that is the set comprising the pairs
(1, 2), (2, 3) . . . (9, 10); 8 combinations for j = 2 through to 1 combination for
j = 9, that is the pair (1, 10). In all there are 9j=1 j = 45 combinations; evaluating
these and averaging, we obtain d̂ = 0.69 and
c = 1.08, so the fitted probability
function is:
pk = 1.08k −2+2(0.69)
(3.68)
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
This function fits the actual survival rates particularly well for S3 onward; the
problem with the starting probabilities has been reduced but not completely
removed. The survival rates and fitted values from the revised estimate are also
shown in Figure 3.7.
3.3.4 Aggregation
Granger (1980) showed that an I(d) process, for d fractional, could result from
aggregating micro relationships that were not themselves I(d). Chambers (1998)
has extended and qualified these results (and see also Granger, 1990). We deal
with the simplest case here to provide some general motivation for the genera-
tion of I(d) processes; the reader is referred to Granger (1980, 1990) for detailed
derivations and other cases involving dependent micro processes.
There are many practical cases where the time series considered in economics
are aggregates of component series; for example, total unemployment aggre-
gates both male unemployment and female unemployment, and each of these,
in turn aggregates different age groups of the unemployed or unemployment
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
chapter; and the reader unfamiliar with frequency domain concepts will need
to first review Sections 4.1.1 to 4.1.4.
where the εkt are zero-mean, independent white noise ‘shocks’ (across the com-
ponents and time). The aggregate variable, which in this case is just the sum of
these two series, is denoted yt , so that:
In Granger’s model, the AR(1) coefficients φk1 are assumed to be the outcomes
of a draw from a population with distribution function F(φ1 ). To illustrate the
argument, Granger (1980) uses the beta distribution, and whilst other distribu-
tions would also be candidates, the link between the beta distribution and the
gamma distribution is important in this context.
3.3.4.ii The AR(1) coefficients are ‘draws’ from the beta distribution
The beta distribution used in Granger (1980) is:
2
dF(φ1 ) = φ2u−1 (1 − φ21 )v−1 dφ1 (3.73)
B(u, v) 1
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
where 0 ≤ φ1 ≤ 1, u, v > 0, and B(u, v) is the beta function with param-
eters u and v. The coefficients φk1 , k = 1, 2, are assumed to be drawn
1
from this distribution. The beta function is B(u, v) = 0 φu−1 1 (1 − φ1 )
v−1 dφ =
1
1 2u−1
2 0 φ1 (1 − φ21 )v−1 dφ1 , which operates here to scale the distribution func-
tion so that its integral is unity over the range of φ1 . In this case, the k-th
autocovariance, γ(k), with v > 1, is given by:
B(u + k/2, v − 1)
γ(k) = (3.74a)
B(u, v)
(v − 1) (k/2 + u)
= (3.74b)
B(u, v) (k/2 + u + v − 1)
where C is a constant.
On comparison with (3.52), 1 − v = 2d − 1, therefore d = 1 − (1/2)v; thus,
the aggregate yt is integrated of order 1 − (1/2)v; for example, if v = 1.5, then
d = 0.25. Typically, the range of interest for d in the case of aggregate time series
is the long-memory region, d ∈ (0, 1], which corresponds to v ∈ (2, 0], with
perhaps particular attention in the range v ∈ [1, 0], corresponding to d ∈ [0.5, 1].
Note that the parameter u does not appear in the order of integration, and it
is v that is the critical parameter, especially for φ1 close to unity, (see Granger
1980).
3.3.4.iii Qualifications
Granger (1990) considers a number of variations on the basic model including
changing the range of φ1 ; introducing correlation in the εit sequences, and
allowing the component series to be dependent; generally these do not change
the essential results. However, Granger (1990) notes that if there is an upper
↔ ↔
bound to φ1 , say φ 1 < 1, so that p(φ1 ≥ φ 1 ) = 0, then the fractional d result
no longer holds. Chambers (1998) shows that linear aggregation, for example
in the form of simple sums or weighted sums, does not by itself lead to an
aggregate series with long memory. As Chambers notes (op.cit.), long memory
in the aggregate requires the existence of long memory in at least one of the
components, and the value of d for the aggregate will be the maximum value of
d for the components. In this set-up, whilst the aggregation is linear, the values
of φk1 are not draws from a beta distribution.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
fractionally integrated
Given the widespread use of Dickey-Fuller tests, a natural question to ask is: what
are their characteristics when the time series is generated by a fractional inte-
grated process? It is appropriate to start with the LS estimator ρ̂ in the following
simple model:
yt = ρyt−1 + εt (3.76)
(1 − L)(yt − y0 ) = ut (3.77a)
d
(1 − L) ut = εt (3.77b)
(
d + 21 ) (1 +
d)
T(1+2d) (ρ̂ − 1) ⇒D − 1 (1)
for
d ∈ (−0.5, 0) (3.79)
(r)2 dr
(1 − d)
0 B
d
(1)
where B (r) is a standard (type I) fBM. (This statement of Sowell’s results embod-
d
ies the amendment suggested by Marinucci and Robinson (1999), which replaces
type II fBM with type I fBM.)
Note that the normalisation to ensure the convergence of ρ̂ depends on d,
and in general one can refer to the limiting distribution of Tmin(1,1+2d) (ρ̂ − 1).
In the case that d = 0, one recovers the standard unit root case in the first of
these expressions (3.78).
0.9
0.8
0.7
τ
0.6 τβ
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Power
0.5 τµ
0.4
0.3
0.2
0.1
0
0.4 0.5 0.6 0.7 0.8 0.9 1
d
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
∞
(d) (d) (d)
yt = r yt−r + εt where r ≡ −Ar (3.80)
r=1
(d)
= r (L)yt + εt
(d)
Applying the DF decomposition of the lag polynomial to r (L) (see UR, Vol. 1,
chapter 3) we obtain:
∞
yt = γ∞ yt−1 + cj yt−j + εt (3.81)
j=1
(Note that (3.83) follows on comparison of (3.82) with (3.81).) Hence, γk will
(d)
not, generally, be zero but will tend to zero as k → ∞, since ∞ r=k+1 r → 0. This
means that despite the generating process not being the one assumed under the
null hypothesis, as k increases it would be increasingly difficult to discover this
from the ADF regression. It was this result, supported by Monte Carlo simula-
tions, that led Hassler and Wolters (1994) to the conclusion that the probability
of rejection of the unit root null in favour of a fractional alternative decreases
for the τ̂ type ADF test as k increases. An extract from their simulation results is
shown in Table 3.2 below. For example, if d = 0.6 ( d = −0.4) then the probability
of rejection, whilst increasing with T, decreases with k from 86.6% for ADF(2) to
Table 3.2 Power of ADF τ̂µ test for fractionally integrated series
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Source: Extracted from Hassler and Wolters, 1994, table 1.
Notes: The generating process is (1 − L)d yt = εt ; the maintained regression is yt = µ∗ + γk yt−1 +
k−1
j=1 cj yt−j + εt,k , with 5,000 replications.
28.8% for ADF(10) when T = 250. Hassler and Wolters (1994) note that the PP
test, which is designed around the unaugmented DF test, does not suffer from
a loss of power as the truncation parameter in the semi-parametric estimator of
the variance increases.
Krämer (1998) points out that consistency can be retrieved if k, whilst increas-
ing with T, does not do so at a rate that is too fast. A sufficient condition, where
d ∈ (−0.5, 0.5), to ensure divergence of τ̂ is k = o(T1/2+d ), in which case τ̂ →p −∞
d < 0 and τ̂ →p ∞ for
for d > 0. For further details and development see Krämer
(1998). Of course, from a practical point of view, the value of d is not known but
it may be possible in some circumstances to narrow the range of likely values of
d in order to check that this condition is satisfied.
Given difficulties with the standard ADF test, if fractional integration is sus-
pected a more sensible route is to use a test specifically designed to test for this
feature. There are a number of such tests; for example one possibility is simply
to estimate d and test it against a specific alternative. This kind of procedure is
considered in the next chapter. In this chapter we first consider test procedures
based on the general DF approach, but applied to the fractional case.
This section introduces an extension of the standard Dickey-Fuller (DF) test for
a unit root to the case of fractionally integrated series. This is a time series
approach, whereas frequency domain approaches are considered in the next
chapter. The developments outlined here are due to Dolado, Gonzalo and
Mayoral (2002), hereafter DGM as modified by Lobato and Velasco (2007), here-
after LV. The fractional DF tests are referred to as the FDF tests and the efficient
FDF, EFDF, tests.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
where εt ∼ iid(0, σε2 ) and γ = (ρ − 1). If yt is I(1), then the regression (3.85) is
unbalanced in the sense that the orders of integration of the regressand and
regressor are different, being I(0) and I(1) respectively. The LS (or indeed any
consistent) estimator of γ should be zero asymptotically to reflect this lack of
balance. The regression (3.85) is relevant to testing the null hypothesis of a unit
root against the alternative of stationarity, that is, in the context of the notation
of FI processes, H0 : d = 1 against HA : d = 0.
The essence of the FDF test is that the regression and testing principle can be
applied even if yt is fractionally integrated. Consider the maintained regression
for the case in which H0 : d = d0 against the simple (point) alternative HA : d
= d1 < d0 ; then, by analogy, the following regression model could be formulated:
The superscripts on refer to the value of d under the null and the (simple)
alternative hypothesis(es) or under specific non-unit root alternatives, respec-
tively, and the properties of zt have yet to be specified. In the standard DF case,
the null hypothesis is d0 = 1 and the (simple) alternative is dA = 0; thus, under
the null d0 yt = 1 yt = yt and d1 yt−1 = yt−1 , resulting in the DF regression
yt = γyt−1 + εt (so that in this special case zt = εt ). However, in the FDF case,
neither value of d is restricted to an integer.
DGM suggest using Equation (3.86), or generalisations of it, to test H0 : d = d0
against the simple (point) alternative HA : d = d1 < d0 or the general alternative
HA : d < d0 . The DF-type test statistics are constructed in the usual way as, for
example, with an appropriately normalised version of γ̂ = (ρ̂ − 1) or the t statistic
on γ̂, for the null that γ = 0, denoted t̂γ . In the case that a simple alternative is
specified, the value of d = d1 under the alternative is known and can be used as
in (3.86); however, the more general (and likely) case is HA : 0 ≤ d < d0 , or some
other composite hypothesis, and a (single) input value of d, say d1 for simplicity
of notation, is required to operationalise the test procedure. In either case LS
estimation of (3.86) gives the following:
T
t=2 yt yt−1
d0 d1
γ̂(d1 ) = T (3.87)
t=2 ( yt−1 )
d 1 2
γ̂
tγ (d1 ) = (3.88)
σd1 (γ̂)
T 1/2
σd1 (γ̂) = σd2 / (d1 yt−1 )2
1 t=2
T
σd2 =
ε2 /T
1 t=2 t
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
In the special case where d0 = 1, then γ̂(d1 ) is given by:
T
yt d1 yt−1
γ̂(d1 ) = t=2
T
(3.89)
t=2 ( yt−1 )
d1 2
where s is the integer part of (d1 + ½), which reflects the assumption that pre-
sample values of yt are zero. (DGM, op. cit., report that this initialisation,
compared to the alternative that the sequence {yt }t=0
−∞ exists, has only minor
effects on the power of the test; see also Marinucci and Robinson, 1999, for
discussion of the general point.)
For the case of interest here, there are two key results as follows that inform
subsequent sections. Let the data be generated by a random walk with iid errors:
yt = εt (3.92)
εt ∼ iid(0, σε2 ) E|ε4t | < ∞
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
tγ (d1 ) ⇒D N(0, 1) (3.93)
see DGM, op. cit., theorem 5. Thus provided that the range of d̂T is restricted
(although not unduly so), standard N(0, 1) testing procedures apply.
ϕ(L)yt = 1−d εt
= εt + (d − 1)εt−1 + 0.5d(d − 1)εt−2 + · · ·
The second line of (3.95) uses the binomial expansion of 1−d ≡ (1 − L)1−d ;
see Equations (3.3) to (3.5). The third line of (3.95) uses ϕ(L)d yt−1 =
εt−1 . Substituting for ϕ(L) in (3.95) and using the BN decomposition
= γ d
yt−1 + [1 − ϕ(L)]yt + zt∗ (3.97b)
p
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
= γd yt−1 + ϕj yt−j + zt∗ (3.97c)
j=1
γ = (d − 1)ϕ(1) (3.97d)
zt∗ ∗
= zt + (d − 1)ϕ (L) 1+d
yt−1 (3.97e)
Note that (3.97a) follows on adding [1 − ϕ(L)]yt to both sides of the previous
p
equation, where 1 − ϕ(L) = j=1 ϕj Lj . Also note that if d = 1, then the terms in
(d−1) disappear and, therefore, ϕ(L)yt = εt ; if d = 0, then ϕ(L)yt = εt (see Q3.5);
also if ϕ(1) > 0, then γ becomes more negative as d increases. Note also that the
p
case with ϕ(1) = 1 − j=1 ϕj = 0 is ruled out because it implies a unit root (or
p
roots) in ϕ(L), and ϕ(1) > 0 implies the stability condition j=1 ϕj < 1. This, as
in the simple case, motivates the estimation of (3.97c) and use of the t-statistic
on γ̂. (However, as in the simple case, LS estimation is inefficient because of the
serially correlated property of zt∗ ; see the EFDF test below.)
In the DGM procedure, d is (generally) replaced by d1 , and the regression to
be estimated, the AFDF(p) regression, is:
p
yt = γd1 yt−1 + ϕj yt−j + zt∗ (d1 ) (3.98)
j=1
Replacement of d1 with d̂T , where d̂T is a consistent estimator, does not alter the
consistency of the test. We also discuss later (see section 3.8.3) an alternative
formulation of the augmented FDF test, the pre-whitened AFDF, due to Lobato
and Velasco (2006), which considers the optimal choice of d1 .
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
FI(d) process:
d yt = εt 1(t>0) (3.99)
The start-up condition εt 1(t>0) indicates that the process is initialized at the start
of the sample; (which, incidentally, identifies the resulting fractional Brownian
motion as type II). Where the context is clear, this initialization will be left
implicit. The usual assumptions on εt are that it has finite fourth moment and
the sequence {εt } is iid.
Now note that by a slight rearrangement (3.99) may be written as follows:
d−1 yt = εt ⇒
yt = 1−d εt
= εt + (d − 1)εt−1 + 0.5d(d − 1)εt−2 + · · ·
The third line uses εt−1 = d yt−1 by one lag of Equation (3.99) and note that:
yt ≡ yt − d yt + εt
≡ (1 − d−1 )yt + εt
This is, so far, just a restatement of an FI(d) process, and though in that sense
trivial it maintains εt as the ‘error’, in contrast to the serially correlated zt in
Equation (3.100). It may be interpreted as a testing framework in the following
way. First, introduce the coefficient π on the variable (d−1 − 1)yt , which then
enables a continuum of cases to be distinguished, including d = 0 and d = 1.
That is:
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
If π = 0, then yt ≡ εt so that d = 1, whereas if π = −1 then d yt ≡ εt for d = 1.
Although the variable (d−1 − 1)yt appears to include yt , this is not the
case. Specifically from an application of Equation (3.3) note that d−1 yt =
t−1 (d−1) (d−1) (d−1)
r=0 Ar yt−r , where A0 ≡ 1, Ar ≡ (−1)r d−1 Cr , and, therefore:
(d−1)
(1 − d−1 )yt = yt − [1 − (d − 1)L + A2 L2 + · · · ]yt
t−1 (d−1)
= (d − 1)yt−1 − Ar Lr yt (3.104)
r=2
+
d1 yt = d εt (3.105)
(d1 −1 − 1)
zt−1 (d1 ) ≡ yt (3.108)
(1 − d1 )
η = π(1 − d1 ) (3.109)
+
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
vt = d εt (3.110)
This set-up, therefore, differs from that of DGM because the suggested regressor
is zt−1 (d1 ) and not d1 yt−1 . The problem as d1 → 1 is solved because apply-
ing L’Hôpital’s rule results in (d1 −1 − 1)/(1 − d1 ) → − ln(1 − L) = ∞ −1 j
j=1 j L .
Considering this limiting case, the regression model becomes:
∞
yt = η j−1 yt−j + vt (3.111)
j=1
d̂T = d1 + op (T−κ ), d̂T > 0.5, d1 > 0.5, κ > 0, d̂T = 1 (3.113)
See LV, theorem 1, who show that the test statistic(s) is, again, normally
distributed in large samples.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ϕ(L)zt = εt 1(t>0)
⇒
p
Alternatively, LV suggest first estimating the coefficients in 1−ϕ(L) = j=1 ϕj L
j
Then use the estimates ϕ̂j of ϕj in the second stage to form η[ϕ̂(L)zt−1 (d̂T )]:
p
yt = η[ϕ̂(L)zt−1 (d̂T )] + ϕj yt−j + v̂t second step (3.121)
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
j=1
(d̂T −1 − 1)
zt−1 (d̂T ) = yt (3.122)
(1 − d̂T )
Notice that the coefficients ϕ̂j are not used in the second set of terms as, in this
case, they relate to the null value d0 = 1, with d0 yt−j = yt−j . As before, the test
statistic is the t test, tη (d̂T ), on the η coefficient. Also DGM suggest a one-step
procedure, for details see DGM (2008, section 3.2.2 and 2009, section 12.6).
tη (d̂T ) ⇒D N(0, 1)
Note that the DGP so far considered has assumed that there are no deterministic
components in the mean. In practice this is unlikely. As in the case of standard
DF tests, the leading specification in this context is to allow the fractionally inte-
grated process to apply to the deviations from a deterministic mean function,
usually a polynomial function of time, as in Equation (3.42). For example, the
ARFIMA(p, d, q) model with mean or trend function µt is:
θ(L)−1 φ(L)d
yt = εt (3.124)
yt ≡ [yt − µt )] (3.125)
d
yt = εt (3.126)
Thus, as in the standard unit root framework, the subsequent analysis is in terms
of observations that are interpreted as deviations from the mean (or detrended
observations), yt . As usual, the cases likely to be encountered in practice are
µ(t) = 0, µ(t) = y0 and µt = β0 + β1 t. The presence of a mean function has
implications for the generation of data for the regressor(s) in the FDF and EFDF
tests. It is now necessary to generate a regressor variable of the form d1 yt−1 ,
where yt = yt − µt , and application will require a consistent estimator µ̂t of µt
and, hence, yˆ t = yt − µ̂t .
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
The simplest case to consider is µt = y0 . Then the generating model is
yt = y0 + w(L)−d εt (3.127)
Notice that E(yt ) = y0 , given E(εt ) = 0, so the initial condition can be viewed
as the mean of the process. Let ds be the truncated (finite sample) operator
(defined below), then it does not follow that ds yt = ds yt except asymptotically
and given a condition on d. The implication is that adjusting the data may be
desirable to protect against finite sample effects (and is anyway essential in the
case of a trend function). The notation in this section will explicitly acknowledge
the finite sample nature of the d operator, a convention that was previously
left implicit.
The argument is as follows. First define the truncated binomial operator ds ,
which is appropriate for a finite sample, that is:
s (d)
ds = Ar Lr (3.128)
r=0
⇒
s (d) (d) (d)
ds yt = Ar yt−r = yt + A1 yt−1 + · · · + As yt−s (3.129)
r=0
(d)
and ds (1) = sr=0 Ar , that is ds evaluated at L = 1.
Next consider the definition yt ≡ yt − y0 , (the argument is relevant for y0
known), so that:
ds
yt = ds (yt − y0 )
Hence, ds yt and ds yt differ by −ds y0 = −y0 ds (1), which is not generally 0 as
it is for d = 1, (that is the standard first difference operator for which y0 = 0).
(d−1)
Using a result derived in Q3.5.a, ds (1) = As and, therefore,
(d−1)
ds
yt − ds yt = −As y0 (3.131)
(d−1) (d−1)
ds (1) = As → 0 is sufficient for As y0 → 0 as s → ∞. Referring to Equation
(d−1)
(3.20), but with d − 1, then As → 0 if d > 0. If s = t − 1, as is usually the case,
(d−1)
then dt−1 yt − dt−1 yt = −At−1 y0 .
The result in (3.131) implies the following for s = t − 1 and the simple FI(d)
process ds yt = εt :
dt−1 yt = dt−1 y0 + εt
(d−1)
= At−1 y0 + εt
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
and for t = 2, . . . , T, this gives rise to the sequence:
(d−1) (d−1) (d−1)
d1 y2 = A1 y0 + ε2 , d2 y3 = A2 y0 + ε3 , . . . , dT−1 yT = AT−1 y0 + εt
(d−1)
Notice that even though y0 is a constant, terms of the form As y0 are not, as
they vary with s. Under the null hypothesis d = 1, all of these terms disappear,
but otherwise do not vanish for d ∈ [0, 1), becoming less important as d → 1
and as s → ∞. The implication of this argument is that demeaning yt using an
estimator of y0 that is consistent under the alternative should, in general, lead
to a test with better finite sample power properties.
The linear trend case is now simple to deal with. In this case µt = β0 + β1 t and
(d−1)
s
d yt = ds [yt − (β0 + β1 t)] = ds yt − ds (1)β0 − β1 ds t = ds yt − As β0 − β1 ds t,
(d−1)
and it is evident that s d yt = s yt . Even if the finite sample effect of As
d β0 is
ignored, the term due to the trend remains.
Shimotsu (2010) has suggested a method of demeaning or detrending that
depends on the value of d. The context of his method is semi-parametric esti-
mation of d, but it also has relevance in the present context. In the case that y0 is
unknown, two possible estimators of y0 are the sample mean ȳ = T−1 Tt=1 yt and
the first observation, y1 . ȳ is a good estimator of y0 when d < | 0.5 | , whereas y1 is
a good estimator of y0 when d > 0.75, and both are good estimators for d ∈ (0.5,
0.75); for further details see Chapter 4, Section 4.9.2.iv. Shimotsu (2010) sug-
gests the estimator y0 that weights ȳ and y1 , using the weighting function κ(d):
y0 = κ(d)ȳ + [1 − κ(d)]y1 (3.132)
A possibility for κ(d) when d ∈ (½, ¾), is κ(d) = 0.5(1 + cos[4π(d − 0.5)]), which
weights y0 toward y1 as d increases. In practice, d is unknown and is replaced
by an estimator that is consistent for d ∈ (0, 1). For example, replacing d by d̂T
results in an estimated value of
y0 given by:
The demeaned data yˆ 0 are then used in the FDF and EFDF tests and d̂T
yt ≡ yt −
is the value of d required to operationalise the test (or d1 where an estimator is
not used).
Next consider the linear trend case, so that µt = β0 + β1 t (note that β0 ≡ y0 ).
The LS residuals are:
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
ŷ¯ = 0 from the properties of LS regression, the weighting function applied to the
residuals ŷt simplifies relative to κ(d̂T ) of Equation (3.113); thus, let
yˆ 0 = [1 − κ(d̂T )]ŷ1
(3.135)
yt = β1 + εt (3.137)
and the LS estimator of β1 is β̂1 = (T − 1)−1 Tt=2 yt . Under H0 , β̂1 is consistent
at rate T1/2 and under HA it is consistent at rate T(3/2)−d (see DGM, 2008). The
adjusted observations are obtained as:
yˆ t = yt − β̂1
(3.138)
The revised regressions for the FDF and EFDF tests (in the simplest cases) using
adjusted data are as follows, where yˆ t generically indicates estimated mean or
trend adjusted data as the case requires:
FDF
yˆ t = γt−1
d̂T ˆ
yt−1 + zt (3.139)
EFDF
yˆ t = ηzt−1 (d̂T ) + vt
(3.140)
d̂ −1
(t−1
T
− 1)
zt−1 (d̂T ) = yˆ t
(3.141)
(1 − d̂T )
As before the test statistics are the t-type statistics on γ and η, respectively,
considered as functions of d̂T .
In the case of the DF tests, the limiting null distributions depend on µt , leading
to, for example, the test statistics τ̂, τ̂µ and τ̂β ; however, that is not the case for
the DGM, LV and LM tests considered here. Provided that the trend function is
replaced by a consistent estimator, say µ̂t , then the limiting distributions remain
the same as the case with µt = 0.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.7 Locally best invariant (LBI) tests
This section outlines a locally best invariant (LBI) test due to Tanaka (1999)
and Robinson (1991, 1994a), the former in the time domain and the latter in
the frequency domain. This test is also asymptotically uniformly most powerful
invariant, UMPI. The test, referred to generically as an LM test, is particularly
simple to construct in the case of no short-run dependence in the error; and
whilst it is somewhat more complex in the presence of short-run dynamics, a
development due to Agiakloglou and Newbold (1994) and Breitung and Hassler
(2002) leads to a test statistic that can be readily computed from a regression
that is analogous to the ADF regression in the standard case.
An advantage of an LM-type test is that it only requires estimation under the
null hypothesis, which is particularly attractive in this case as the null hypoth-
esis leads to the simplification that the estimated model uses first differenced
data. This contrasts with test statistics that are based on the Wald principle, such
as the DGM and LV versions of the FDF test, which also require specification
and, practically, estimation of the value of d under the alternative hypothesis.
Tanaka’s results hold for any d; specifically, for typical economic time series
where d ≥ 0.5, and usually d ∈ [0.5, 1.5), so that d is in the nonstationary
region. The asymptotic results do not require Gaussianity.
(1 − L)(d+c) yt = εt (3.142)
The case emphasized so far has been d = d0 = 1, that is, there is a unit root, and a
test of the null hypothesis H0 : c = 0 against the alternative HA : c < 0 is, in effect, a
test of the unit root null against the left-sided alternative, with local alternatives
√
of the form c = δ/ T. However, other cases can be treated within this framework,
for example H0 : d = d0 = ½, so that the process is borderline nonstationary;
against HA : d0 + c, with c < 0, implying that the generating process is stationary;
and again with d0 = 1 but HA : d0 + c, c > 0, so that the alternative is explosive.
Hereafter, the initialization εt 1t>0 will be left implicit; and note that normality
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
is not required for the asymptotic results to hold and can be replaced with an
iid assumption. The (conditional) log-likelihood function is:
T 1 T
LL({yt }; c, σε2 ) = − log(2πσ2ε ) − 2 [(1 − L)d+c yt ]2 (3.144)
2 2σε t=1
Let xt ≡ (1 − L)d yt , then the first derivative of LL(.) with respect to c (the score),
evaluated at c = 0, is:
∂LL({yt }; c, σε2 )
ST1 ≡ |H0 : c = 0 , σε2 = σ̂ε2
∂c
1 T t−1 −1
= 2 xt j xt−j
σ̂ε t=2 j=1
T−1
=T j−1 ρ̂(j) (3.145)
j=1
T
t=j+1 xt xt−j
ρ̂(j) = T (3.146)
2
t=1 xt
T
σ̂ε2 = T−1 d y2t
t=2
T
= T−1 x2
t=2 t
ρ̂(j) is the j-th autocorrelation coefficient of {xt }; more generally ρ̂(j) is the j-th
autocorrelation coefficient of the residuals (obtained under H0 ) and σ̂ε2 is an
estimator of σε2 .
In the case of the unit root null, H0 : d = d0 = 1, hence xt = yt and, therefore,
ρ̂(j) simplifies to:
T
t=j+1 yt yt−j
ρ̂(j) = T (3.147)
t=1 yt
2
1 T ∗
ST1 = xt xt−1 (3.149)
σ̂ε2 t=2
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(See Q3.7) Apart from the scale factor σ̂ε−2 , this is the numerator of the
least squares estimator of the coefficient in the regression of xt on xt−1 ∗ , an
⇒D N δ π 6, 1
2
(3.151)
Hence, under the null δ = 0 and LM0 ⇒D N(0, 1), leading to the standard decision
rule that, with asymptotic size α, reject H0 : c = 0 against HA : c < 0 if LM0 < zα ,
and reject H0 against HA : c > 0 if LM0 > z1−α , where zα and z1−α are the lower
and upper quantiles of the standard normal distribution. Alternatively for a
two-sided alternative take the square of LM0 and compare this with the critical
values from χ2 (1), rejecting for ‘large’ values of the test statistic.
The LM0 test is locally best invariant, LBI, and asymptotically uniformly
most powerful invariant, UMPI. For a detailed explanation of these terms, see
Hatanaka (1996, especially section 3.4.1). In summary:
locally: the alternatives are local to d0 , that is d0 + c, where c > 0 (c < 0), but c ≈
√
0; the ‘local’ parameterization here is c = δ/ T.
best: considering the power function of the test statistic at d0 and d0 + c, the
critical region for a size α test is chosen such that the slope of the power function
is steepest.
asymptotically: as T → ∞.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Tanaka (1999, theorem 3.2 and corollary 3.2) shows that asymptotically the
power of LM0 coincides with the power envelope of the locally best invariant
test.
x =
Zβ + ε (3.154)
∂LL({yt }; c, σε2 )
ST1 ≡ |H0 : c = 0 , σε2 = σ̂ε2
∂c
1 T
=− [ln(1 − L)(1 − L)d (yt − Zt β̂)](1 − L)d (yt − Zt β̂)
σ̂ε2 t=1
1 T
=− 2 [ln(1 − L)ε̂t ]ε̂t
σ̂ε t=1
T−1
=T j−1 ρ̂(j) (3.155)
j=1
where ρ̂(j) = Tt=j+1 ε̂t ε̂t−j / Tt=1 ε̂2t is the j-th autocorrelation coefficient of the
residuals ε̂; (the derivation of the last line is the subject of an end of chapter
question). H0 : c = 0 is rejected against c > 0 for large values of ST1 and rejected
against c < 0 for small values of ST1 . As noted above, a suitably scaled version
of ST1 has a standard normal distribution, which then provides the appropriate
quantiles.
We can now consider some special cases. The first case of no determinis-
tics has already been considered, in that case Z is empty and y is generated
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
as d y = ε. In this case, the value of d is given under the null and no param-
eters are estimated. The test statistic is based on x = d y, so that for d = 1,
x = y = (y2 − y1 , . . . , y2 − yT−1 ). It is also important to consider how to deal
with the presence of a linear trend (the test statistics are invariant to y0 = 0).
Suppose the generating process is:
yt = β0 + β1 t + −d εt (3.156)
−d
= Zt β + εt
where Zt = (1, t) and β = (β0 , β1 ) . β̂ is obtained as β̂ = (
Z
Z)−1
Z x where, under the
unit root null d = 1, x = y and Z = Z = (z1 , z2 ), with z1 a column of 0s and
z2 a column of 1s; the first element of β̂ is, therefore, annihilated and β̂1 is the
LS coefficient from the regression of yt on a constant, equivalently the sample
mean of yt . The residuals on which the LM test is based are, therefore, ε̂t =
xt −
Zt β̂ = yt − β̂1 and the adjustment is, therefore, as described in Section 3.6.
T 1 T
LL({yt }; c, ψ, σε2 ) = − log(2πσ2ε ) − 2 [θ(L)−1 ϕ(L)(1 − L)d0 +c yt ]2
2 2σε t=1
(3.158)
where ψ = (ϕ , θ ) collects the coefficients in ϕ(L) and θ(L). The score with respect
to c is as before, but with the residuals now defined to reflect estimation of the
∂LL({yt }; c, ψ, σε2 )
ST2 ≡ |H0 : c = 0 , σε2 = σ̂ε2 , ψ = ψ̂
∂c
T−1
=T j−1 ρ̂(j) (3.159)
j=1
where ρ̂(j) is the j-th order autocorrelation coefficient of the residuals given by:
T
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
t=j+1 ε̂t ε̂t−j
ρ̂(j) = T (3.160)
t=1 ε̂t
2
Notice that the residuals are estimated using the null value d0 , hence d itself is
not a parameter that is estimated. Then, see Tanaka (op. cit., theorem 3.3):
ST2 √ T−1 −1
√ = T j ρ̂(j) (3.162)
T j=1
√
⇒D N(δω2 , ω2 ) c = δ/ T
⇒D N(δω, 1)
and if δ = 0, then
LM0 ⇒D N(0, 1)
where gj and hj are the coefficients of Lj in the expansions of ϕ(L)−1 and θ(L)−1
respectively, and is the Fisher information matrix for ψ = (ϕ , θ ).
To operationalise (3.163) following the LM principle, a consistent estimate of
ω under the null denoted ω̂0 is substituted for ω, resulting in the feasible LM
test statistic, LM0,0 :
√
T T−1 −1
LM0,0 = j ρ̂(j) (3.166)
ω̂0 j=1
⇒D N(δω, 1)
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
approach due to Agiakloglou and Newbold (1994), as developed by Breitung and
Hassler (2002). First consider the simple case in which the data are generated as
(1 − L)d+c yt = εt and note that the structure of ST1 suggested the least squares
∗ , that is:
regression of xt on xt−1
∗
xt = αxt−1 + et (3.167)
d
xt ≡ yt (3.168)
t−1
∗
xt−1 ≡ j−1 xt−j (3.169)
j=1
The least squares estimator and associated test statistic for H0 : c = 0 against HA :
c = 0 translates to H0 : α = 0 and HA :α = 0. The LS estimator of α and the square
of the t statistic for α̂, are, respectively:
T ∗
t=2 xt xt−1
α̂ = T (3.170)
∗
t=2 (xt−1 )
2
2
T ∗
t=2 xt xt−1
tα2 = T (3.171)
σ̂e2 ∗
t=2 (xt−1 )
2
∗ .
where σ̂e2 = T−1 Tt=2 ê2t and êt = xt − α̂xt−1
Breitung and Hassler (2002, theorem 1) show that for data generated as
(1 − L)(d+c) yt = εt 1t>0 , with εt ∼ (0, σε2 < ∞), then:
and the last line assumes the initialization y0 = 0. Thus, the difference between
the two approaches is that the regressor y∗t−1 weights the lagged first differences,
with weights that decline in the pattern (1, 21 , 13 , . . . , t−1
1
).
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Note that the test statistic is here presented in the form tα2 as HA : c = 0,
however, the more likely case is HA : c < 0, so the test statistic tα can be used
with left-sided critical values from N(0, 1).
xt ≡ (1 − L)d yt
Then under the null d = d0 and xt ≡ (1 − L)d0 yt can be formed. The resulting
p-th order autoregression in xt is:
xt = ϕ1 xt + · · · + ϕp xt−p + εt (3.179)
Notice that the regressors xt−i are also included in the LM regression. (See
Wooldridge, 2011, for a discussion of this point for stochastic regressors.) The
suggested test statistics are tα or tα2 , which are asymptotically distributed as
N(0, 1) and χ2 (1) respectively.
In the case of the invertible ARFIMA (p, d, q) model, the generating process is:
xt ≡ (1 − L)d yt
Then, as before, under the null d = d0 and xt ≡ (1 − L)d0 yt . In this case A(L) is
an infinite order lag polynomial and, practically, is approximated by truncating
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
the lag to a finite order, say k. A reasonable conjecture based on existing results
is that provided the order of the approximating lag polynomial expands at an
appropriate rate, then the limiting null distributions of tα and tα2 are maintained
as in the no serial correlation case. The suggested rate for the ADF(k) test is k
1
→ ∞ as T → ∞ such that k/T /3 → 0 (see also Ng and Perron (2001) and UR Vol. 1,
chapter 9).
(d̂ − d0 )
td̂ = √ (3.186)
(ω−1 / T)
√
= Tω(d̂ − d0 ) (3.187)
⇒D N(0, 1)
√
The first statement, (3.185), says that T(d̂ − d) is asymptotically normally dis-
√
tributed with zero mean and variance ω−2 . The ‘asymptotic variance’ of Td̂
√
is ω−2 and the ‘asymptotic standard error’ is Tω−1 , therefore standardising
√
(d̂ − d) by ω−1 / T gives the t-type statistic, denoted td̂ in (3.186). This results in
a quantity that is asymptotically distributed as standard unit normal. In prac-
tice, one could use the estimator of ω−1 obtained from ML estimation to obtain
a feasible (finite sample) version of td̂ .
√
In the simplest case ω2 = (π2 /6), therefore ω−2 = (π2 /6)−1 = 6/π2 , T(d̂ −
√
d) ⇒D N(0, 6/π2 ) and the test statistic is td̂ = 1.2825 T(d̂ − d0 ) ⇒D N(0, 1). The
more general case is considered in a question (see Q3.8).
Tanaka (op. cit.) reports some simulation results to evaluate the Wald and
LM tests (in the forms LM0 and LM0,0 ) and compare them with the limiting
power envelope. The data-generating process is an ARFIMA(1, d, 0) model, with
niid input, T = 100, and 1,000 replications with a nominal size of 5%. The
results most likely to be of interest in an economic context are those around the
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
stationary/nonstationary boundary, with H0 : d0 = 0.5 against HA : dA > 0.5 and,
alternately, H0 : d0 = 1 against HA : dA < 1.
With a simple fractional integrated process, the simulated size is 5.7% for
LM0 and 3.9% for td̂ using ω2 = (π2 /6), and 3.9% and 3.5%, respectively, when
ω2 is estimated. In terms of size-unadjusted power, the LM test is better than
the Wald test, for example when d = 0.6, 36.2% and 30.7% respectively using
ω2 = (π2 /6), and 30% and 29.3% when ω2 is estimated. Given the difference in
nominal size, it may well be the case that there is little to choose between the
tests in terms of size-adjusted power.
In the case of the unit root test against a fractional alternative, ARFIMA(1, d,
0), Tanaka (op. cit.) reports the results with ϕ1 = 0.6 and, alternately, ϕ1 = −0.8;
we concentrate on the former as that corresponds to the more likely case of
positive serial correlation in the short-run dynamics. The simulation results
suggest a sample of T = 100 is not large enough to get close to the asymptotic
results. For example, the nominal sizes of LM0 and LM0,0 were 1.4% and 2.2%,
whereas the corresponding figures for the Wald test were 9.8% and 16.1%; these
differences in size suggest that a power comparison could be misleading.
3.8 Power
where ARE(t̂1 , t̂2 , d) > 1 implies that test t̂1 is asymptotically more powerful than
test t̂2 . ARE is a function of the ratio of the non-centrality parameters, so that t̂1
is asymptotically more powerful than t̂2 as |nc1 | > |nc2 | (because of the nature
of HA , large negative values of the test statistic lead to rejection, thus nci < 0).
DGM (2008, figures 1 and 2) show that the non-centrality parameters of tγ (d1 ),
tη (d1 ) and LM0 , say ncγ , ncη and nc0 , are similar for d ∈ (0.9, 1), but for d ∈ (0.3,
0.9) approximately, |ncη | > |ncγ | > |nc0 |, whereas for d ∈ (0.0, 0.3), |ncη | ≈ |ncγ |
> |nc0 |. In turn these imply that in terms of asymptotic power tη (d1 ) ≈ tγ (d1 )
≈ LM0 for d ∈ (0.9, 1); tη (d1 ) > tγ (d1 ) > LM0 for d ∈ (0.3, 0.9); and tη (d1 ) ≈
tγ (d1 ) > LM0 for d ∈ (0.0, 0.3). Perhaps of note is that there is little to choose
(asymptotically) between the tests close to the unit root for fixed alternatives.
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
3.8.2 Power against local alternatives
When the null hypothesis is that of a (single) unit root, so that yt = εt , then
the local alternative is:
d yt = εt (3.189)
√
d = 1 + δ/ T δ<0 (3.190)
where εt is iid with finite fourth moment. This is as in the LM test, with d0 = 1.
As T → ∞ then d → 1, which is the value under H0 , so that there is a sequence
of alternative distributions that get closer and closer to that under H0 as T → ∞.
The test statistic tη (d̂T ) is asymptotically equivalent to the Robinson/Tanaka LM
test which is, as noted above, UMPI under a sequence of local alternatives. To
see this equivalence it is helpful to consider the distributions of the test statistics
under the local alternatives.
Let the DGP be d yt = εt 1(t>0) , then under the sequence of local alternatives
√
d = 1 + δ/ T, δ < 0, d1 ≥ 0.5, the limiting distributions are as follows:
LM
LM0 ⇒D N(δh0 , 1)
∞
h0 = j−2 = π2 /6 = 1.2825 (3.191)
j=1
FDF
√
tγ (d) ⇒D N(λ1 , 1) d = 1 + δ/ T
λ1 = δh1 , h1 = 1 (3.192)
tγ (d1 ) ⇒D N(λ2 (d1 ), 1)
λ2 (d1 ) = δh2 (d1 )
(d1 )
h2 (d1 ) = d1 > 0.5 (3.193)
d1 (2d1 − 1)
EFDF
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
d̂T = d2 + op (T−κ ), κ > 0, d̂T > 0.5
(d)
The Aj coefficients are defined in Equations (3.4) and (3.5); tγ (d) is computed
√
from the regression of yt on d yt−1 , d = 1 + δ/ T; whereas tγ (d1 ) is computed
from the regression of yt on d1 yt−1 ; tη (d̂T ) is computed from the regression
of yt on zt−1 (d̂T ) and LM0 is given by (3.151). The input values d1 and d2 are
distinguished as they need not be the same in the two tests.
The effect of the local alternative on the distributions of the test statistics is
to shift each distribution by a non-centrality parameter λi , which depends on
δ and a drift function denoted, respectively, as h0 , h1 , h2 (d1 ) and h3 (d2 ); note
that the first two are just constants. In each case, the larger the shift in λi (dj )
relative to the null value, δ = 0, the higher the power of the test.
The drift functions are plotted in Figure 3.9. Note that h2 (d1 ) and h3 (d2 ) are
everywhere below h0 , but that h3 (d2 ) → h0 as d → 1, confirming the asymptotic
local equivalence of tη (d̂T ) and LM0 , given a consistent estimator of d for the
former. The drift functions of tγ (d1 ) and tη (d̂T ) cross over at d = 0.77, indicating
that tη (d̂T ) is not everywhere more powerful than tγ (d1 ), with the former more
powerful for d ∈ (0.5, 0.77), but tη (d̂T ) more powerful closer to the unit root.
Note also that there is a unique maximum value of d1 for h2 (d1 ), say d1∗ ,
which occurs at d1∗ ≈ 0.69145 and, correspondingly, h2 (d1∗ ) = 1.2456, at which
the power ratio of tγ (d1 ) relative to LM0 is 97%. What is also clear is that d1 = 0.5
is a particularly poor choice as an input for t̂γ (d1 ) as λ1 = 0. Comparing tγ (d)
and tγ (d1∗ ), the squared ratio of the relative noncentrality parameters, which
gives the asymptotic relative efficiency is (1/1.2456)2 = 0.6445; note that the
power of tγ (d1 ) is greater than that of tγ (d) for d1 ∈ (0.56, 1.0).
1.4
upper limit h0
1.2 h2
h3
1
h1
0.8
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
h(d1)
0.6
0.2
0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
d
d1∗ , is a function of d and is written as d1∗ (d) to emphasise this point. LV (2006)
find that this relationship is in effect linear over the range of interest (d ≥ 0.5)
and can be fitted by d̂1∗ (d) = −0.031 + 0.719d, thus for a specific alternative,
d = d1 , the ‘best’ input value for the FDF regressor is d1∗ = −0.031 + 0.719d1 . In
the composite alternative case, d1 is estimated by a T κ -consistent estimator d̂T ,
with κ > 0, and the optimal selection of the input value is:
To emphasise, note that in this procedure it is not the consistent estimator d̂T
that is used to construct the regressor, (nor is it d = d1 in the simple alternative),
but a simple linear transform of that value, which will be less than d̂T (or d1 ).
Using this value does not alter the limiting standard normal distribution (LV,
2006, lemma 1).
In the more general case with ϕ(L)d yt = εt 1(t>0) , the optimal d1 depends on
the order of ϕ(L) for local alternatives and the values of the ϕj coefficients for
fixed alternatives. LV (2006) suggest a prewhitening approach, in which the first
step is to estimate ϕ(L) from a p-th order autoregression using yt , that is under
p
the null d = 1, ϕ(L)yt = εt ⇒ yt = j=1 ϕ̂j yt−j + ε̂t , where ^ over denotes a
p
LS estimator; the second step is to form y̆t = yt − ŷt , where ŷt = j=1 ϕ̂j yt−j ; y̆t is
Table 3.3 Optimal d1 , d1∗ , for use in PAFDF t̂γ (d1 ) tests (for local
alternatives)
p 0 1 2 3 4 5
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
referred to as the prewhitened data. The final step is to run an AFDF regression,
but with y̆t rather than yt , that is:
p
y̆t = γd1 y̆t−1 + ϕj yt−j + ut∗ (d1 ) (3.196)
j=1
(Note that this differs from DGM’s, 2002, ADF in that it uses y̆t rather than
yt ; also note that yt may be subject to a prior adjustment for deterministic
components.) As before, the test statistic is the t-type test associated with γ,
tγ (d1 ), and the inclusion of lags of yt control, the size of the test. The revised
AFDF regression is referred to as the prewhitened AFDF, PAFDF.
In the case of a sequence of local alternatives, the optimal d1 depends on the
order p of ϕ(L), but not the value of ϕj coefficients themselves, so that d1∗ ≈
0.69145 can be seen to correspond to the special case where p = 0, as shown in
Table 3.3, for p ≤ 5.
In the case of fixed alternatives, there is no single function such as (3.195)
but a function that varies with p and the values of ϕj . LV (2006, op. cit., section
4), to which the reader is referred for details, suggest an algorithm that leads
to the automatic selection d1∗ ; they note that the method is computationally
intensive, but can offer power improvements in finite samples.
(LW) contrast, and the exact LW estimator of Shimotsu and Phillips (2005) and
Shimotsu (2010), referred to as d̂LW and d̂ELW , respectively (see Chapter 4). In
the case of d̂LW , the data are differenced and one is added back to the estimator,
whereas this is not necessary for d̂ELW , which is consistent for d in the nonsta-
tionary region. The FDF tests were relatively insensitive to the choice of d̂T , so
a comparison is only made for the EFDF tests.
The tests to be considered are:
FDF: tγ (d1∗ ) where d1∗ = 0.69145; tγ (d̂1∗ ) based on d̂LW ; and tγ (d̂LW ).
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
EFDF: tη (d̂LW ) and tη (d̂ELW ).
LM: Tanaka’s LM0 and the Breitung-Hassler regression version tα .
The DGPs are variations of the following:
[yt − µt ] = ut (3.197)
yˆ t ≡ yt − µ̂t
(3.199)
The first case to be considered is when µt = y0 and the estimators are as described
in Section 3.6. The simulations are with one exception invariant to the value
of y0 . The exception arises when using the test based on d̂ELW . To see why,
compare this method with that based on d̂LW ; in the latter case, the data are
first differenced and then 1 is added back to the estimator, thus, like the LM
tests, invariance to a non-zero initial value is achieved by differencing. In the
case of d̂ELW , the estimator is asymptotically invariant to a non-zero initial value
for d ≥ 0.5, but Shimotsu (2010) recommends demeaning by subtracting y1 from
yt to protect against finite sample effects. Thus, if it is known that y0 = 0, there
would be no need to use the Shimotsu demeaned data; however, in general this
is not the case (and it is not the case here), so the data is demeaned both in the
estimation of d and in the construction of the tests. The results, based on 5,000
replications, are given in Table 3.4a for T = 200 and Table 3.4b for T = 500, and
illustrated in Figures 3.10a and 3.10b.
The central case is T = 200. The empirical sizes of tγ (d1∗ ), tγ (d̂LW ) and tα are
close to the nominal size of 5%, whereas tγ (d̂1∗ ), tη (d̂LW ) and tη (d̂ELW ) are slightly
oversized, whilst LM0 is undersized. Relative power differs across the parameter
space. In terms of size-adjusted power (given the differences in nominal size),
for d close to the unit root, say d ∈ (0.9, 1.0), tγ (d1∗ ) is best, but it loses a clear
advantage as d decreases, with a marginal advantage to tη (d̂LW ) and tα . The least
powerful of the tests is tγ (d̂LW ). Figure 3.10a shows how, with one exception, the
power of the tests clusters together, whereas the region d ∈ (0.9, 1.0) is shown
in Figure 3.10b.
The results for T = 500 (see Table 3.4b for a summary), show that the tests, with
the exception of tγ (d̂LW ), which has the lowest power, are now quite difficult to
Table 3.4a Power of various tests for fractional d, (demeaned data), T = 200
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.90 0.491 0.424 0.587 0.546 0.548 0.403 0.511
0.95 0.210 0.195 0.268 0.228 0.234 0.148 0.212
1.00 0.049 0.053 0.079 0.067 0.069 0.030 0.054
size-adjusted power
0.60 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.65 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.70 0.999 0.998 1.000 1.000 1.000 0.999 0.997
0.75 0.992 0.977 0.994 0.998 0.999 0.996 0.996
0.80 0.937 0.893 0.945 0.959 0.956 0.950 0.954
0.85 0.779 0.691 0.773 0.812 0.805 0.796 0.805
0.90 0.494 0.413 0.467 0.496 0.481 0.483 0.490
0.95 0.214 0.186 0.181 0.195 0.193 0.196 0.199
1.00 0.050 0.050 0.050 0.050 0.050 0.050 0.050
Table 3.4b Power of various tests for fractional d, (demeaned data), T = 500
Note: d1∗ = 0.69145, d̂LW is the local Whittle estimator, d̂ELW is the exact Whittle estimator.
1
tη (dLW)
0.9 T = 200
0.8
tγ (dLW)
0.7
0.6
Power
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.5
0.4
0.3 tγ (d*1)
0.2
0.1
0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
d
0.5
0.45 T = 200
0.4
0.35
0.3
Power
tγ (dLW)
tγ (d*1)
0.25
0.2
0.15
0.1
0.05
0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1
d
distinguish across the range d ∈ (0.6, 1.0), although tγ (d̂1∗ ), tη (d̂LW ) and tη (d̂ELW )
are still slightly oversized.
The other case likely to arise in practice is that of a possible non-zero deter-
ministic trend, which is illustrated with detrended data. The LM tests now
also require detrending, which is achieved by basing the test on ε̂t = yt − β̂1 ,
where β̂1 is the sample mean of yt . The FDF-type tests are based on (Shimotsu)
detrended data obtained as yˆ t = y1 , where
yt − yt = yt − (β̂0 + β̂1 t); see Section
3.6, Shimtosu (2010) and Chapter 4. The results are summarised in Tables 3.5a
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
and 3.5b; and power and s.a power are shown in Figures 3.11a and 3.11b for
T = 200.
Starting with T = 200, all of the tests are oversized, but the LM0 test is only
marginally so at 5.5% followed by tγ (d̂LW ) at 6.6% (see the upper panel of Table
3.5a); the other tests have broadly twice their nominal size. The oversizing indi-
cates that the finite sample 5% critical value is to the left of the nominal critical
value. One way of overcoming the inaccuracy of the nominal critical values
is to bootstrap the test statistics (See UR, Vol. 1, chapter 8). An approximate
estimate of the finite sample 5% cv can be obtained by finding the normal dis-
tribution that delivers a 10% rejection rate with a critical value of −1.645; such
a distribution has a mean of −0.363 (rather than 0) and a (left-sided) 5% cv of
−2.0. Overall, in terms of s.a power, LM0 is the best of the tests although the dif-
ferences between the EFDF tests and the LM tests are relatively slight and tγ (d̂1∗ )
is also competitive (see Figure 3.11a); when the alternative is close to the unit
root, tγ (d1∗ ) and tγ (d̂LW ) also become competitive tests (see Figure 3.11b); whilst
away from the unit root tγ (d̂LW ) is least powerful. Thus, LM0 is recommended
in this case as it is close to its nominal size and the best overall in terms of (s.a)
power.
The general oversizing results for T = 200 suggest that it would be of interest to
increase the sample size for a better picture of what is happening to the empirical
size of the test, and Table 3.5b presents a summary of the results for T = 1,000.
In this case, tγ (d1 ), tγ (d̂LW ), and tα are now much closer to their nominal size
and LM0 maintains its relative fidelity; there are improvements in the empirical
size of tγ (d̂1∗ ), tη (d̂LW ) and tη (d̂ELW ), but these remain oversized. The s.a power
confirms that the LM tests still have a marginal advantage in terms of power,
whereas tγ (d̂LW ) is clearly the least powerful. Overall, the advantage does again
seem to lie with the LM type tests, LM0 and tα .
The various tests for a fractional root are illustrated with a time series on US
wheat production; the data is in natural logarithms and is annual for the period
1866 to 2008, giving a sample of 143 observations. The data, yt , and the data
detrended by a regression on a constant and a linear trend, yt , are graphed
Table 3.5a Power of various tests for fractional d, (detrended data), T = 200
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
0.90 0.541 0.449 0.650 0.630 0.630 0.453 0.580
0.95 0.270 0.219 0.345 0.311 0.311 0.187 0.274
1.00 0.082 0.066 0.124 0.108 0.115 0.055 0.093
size-adjusted power
0.60 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.65 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.70 0.999 0.998 1.000 1.000 1.000 0.999 1.000
0.75 0.977 0.974 0.990 0.993 0.991 0.992 0.993
0.80 0.899 0.877 0.928 0.930 0.929 0.925 0.927
0.85 0.698 0.655 0.735 0.735 0.733 0.740 0.742
0.90 0.422 0.381 0.430 0.425 0.424 0.437 0.427
0.95 0.176 0.167 0.170 0.165 0.166 0.178 0.174
1.00 0.050 0.050 0.050 0.050 0.050 0.050 0.050
Table 3.5b Power of various tests for fractional d, (detrended data), T = 1,000
0.7 tγ (d*1)
0.6 tγ (dLW)
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
Power
0.5
0.4
0.3
0.2
0.1
0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
d
0.45
0.4 T = 200
0.35
0.3 tγ (dLW)
Power
0.25
0.2 LM0
0.15
0.1
0.05
0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1
d
(a)
15
14
13
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
12
1860 1880 1900 1920 1940 1960 1980 2000
(b)
0.5
–0.5
–1
1860 1880 1900 1920 1940 1960 1980 2000
in Figures 3.12a and 3.12b, respectively; the data is obviously trended, whilst
the detrended data shows evidence of a long cycle and persistence. A standard
approach would be to apply a unit root test to either yt or yt . To illustrate
we calculate the Shin and Fuller (1998) exact ML test for a unit root, denoted
LRUC,β and based on the unconditional likelihood function, applied to yt , and
the version of the ADF test appropriate to a detrended alternative, that is τ̂β ,
(see UR, Vol. 1, chapters 3 and 6).
ML estimation of an ARMA(p, q) model considering all lags to a maximum
of p = 3 and q = 3 led to an ARMA(1, 1) model selected by AIC and BIC. An
ADF(1) was selected based on a marginal t selection criterion using a 10% signif-
icance level. The estimation and test results are summarized in Table 3.6, with
‘t’ statistics in parentheses.
The results show that whilst the ML test does not reject the null hypothesis of a
unit root, the ADF test leads to the opposite conclusion and both are reasonably
robust in that respect to variations in the significance level.
ADF(1) estimation
yt = −0.317yt−1 −0.142yt−1
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
(−4.72) (−1.71)
τ̂β = −4.52, 5% cv = −3.44
Table 3.7 Tests with a fractionally integrated alternative; p = 0 and p = 2 for augmented
tests
p=2
tγ (d1∗ ) tγ (d̂LW ) tγ (d̂ELW ) tη (d̂LW ) tη (d̂ELW ) LM0,0 tα
test value −2.54 −2.43 −3.23 −1.93 −3.19 −4.88 −3.00
Notes: d1∗ = 0.691 if no serial dependence allowed (p = 0); d1∗ = 0.901 for p = 2; throughout the
limiting 5% cv is −1.645, from the standard normal distribution; two-step estimation procedure used
for tη (d̂LW ) and tη (d̂ELW ) when p = 2 (see Section 3.5.7); similar results were obtained from nonlinear
estimation.
The tests in this chapter allow for the possibility that the alternative is a
fractionally integrated, but still nonstationary, process. The test results are sum-
marised in Table 3.7. The ML and ADF models suggest that the tests have to take
into account the possibility of serially dependent errors, but for comparison the
tests are first presented in their simple forms.
As a brief recap in the case of weakly dependent errors, the test statistics are
obtained as follows. In the case of the (LV version of the) FDF test, the suggested
augmented regression (see Section 3.8.3) is:
p
y̆t = γd1 y̆t−1 + ϕj yt−j + ut∗ (d1 ) (3.200)
j=1
p
This is referred to as the prewhitened ADF, PADF, where y̆t = yt − j=1 ϕ̂jyt−j , and
ϕ̂(L), a p-th order polynomial, is obtained from fitting an AR(p) model to yt ,
where yt is the detrended data. The test statistic is the t-type test associated with
γ, tγ (d1 ), and the optimal d1 depends on the order of ϕ(L) for local alternatives
(and the values of the ϕj coefficients for fixed alternatives); see Table 3.3. We find
p = 2 is sufficient to ‘whiten’ yt . Reference to Table 3.3 indicates that d1∗ = 0.901
for the choice of p = 2.
yt = η[ϕ(L)zt−1 (d̂T )] + [1 − ϕ(L)]
yt + vt (3.201)
The test statistic is the t test associated with η, tη (d̂T ). This regression is non-
linear in the coefficients because of the multiplicative form ηϕ(L), and the ϕ(L)
coefficients also enter through the second set of terms. As noted above, there are
several ways to deal with this and the illustration uses the two stage procedure
(see Section 3.5.7); see Q3.11 for the nonlinear estimation method. For p = 2,
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
the first stage comprises estimation of the AR(2) coefficients obtained from:
d̂T
yt = ϕ̂1 d̂T
yt−1 + ϕ̂2 d̂T
yt−2 + ξ̂t
yt = ηwt−1 + ϕ1
yt−1 + ϕ2
yt−2 + v̂t
where wt−1 = (1 − ϕ̂1 L − ϕ̂2 L2 )zt−1 (d̂T ), with ϕ̂1 and ϕ̂2 obtained from the first
stage.
The relevant version of the LM test is LM0,0 , given by:
√
T T−1 −1
LM0,0 = j ρ̂(j) ⇒D N(0, 1) under the null
ω̂0 j=1
T
t=j+1 ε̂t ε̂t−j
ρ̂(j) = T
t=1 ε̂t
2
The various test results are illustrated with p = 2 and summarised in Table 3.7.
The results suggest that the unit root null should be rejected at usual significance
levels; moreover the semi-parametric estimates of d, which are d̂LW = 0.765 and
d̂ELW = 0.594, with m = T0.65 , tend to confirm this view. (Note that as d̂ELW is
further from the unit root than d̂LW , the test statistics are more negative when
d̂ELW is used.) The simulation results reported in Table 3.5 suggested that LM0
maintained its size (at least at 5%), whereas the other tests had an actual size
about twice the nominal size, that is the finite sample critical value was to the
left of the nominal critical value. The test results are generally robust to likely
variations due to the oversizing found using critical values from the standard
normal distribution.
It is possible to set up different null hypotheses of interest; one in particular is
Copyright material from www.palgraveconnect.com - licensed to Glasgow University Library - PalgraveConnect - 2015-10-19
H0 : d = 0.5 against HA : d ∈ (0.5, 1], which is a test of borderline nonstationarity,
and it is simple enough to apply the various tests to this situation, for details
see Tanaka (1999) and Breitung and Hassler (2002) for the LM tests and DGM
(2009) for the extension of the EFDF test.
This chapter has introduced some key concepts to extend the range of inte-
grated processes to include the case of fractional d. These are of interest in
themselves and for their use in developments of the I(d) literature to fraction-
ally cointegrated series. The first question to be addressed is: can a meaning
be attached to the fractional difference operator applied to a series of obser-
vations? The answer is yes and relies upon an expansion of (1 − L)d using the
binomial theorem either directly or through a gamma function representation,
whilst the treatment of the initial condition gives rise to two types of fraction-
ally integrated processes. Corresponding developments have been made in the
mathematical analysis of fractionally integrated processes in continuous time;
see for example Oldham and Spanier (1974), Kalia (1993) and Miller and Ross
(1993).
Most of the analysis of this chapter was in the time domain, with the fre-
quency domain approach to be considered in the following chapter. Once
fractional d is allowed, the range of hypotheses of interest is extended quite nat-
urally. The set-up is no longer simply that of nonstationarity, d = 1, to be tested
against stationarity, d = 0; one obvious generalisation is d = 1 against d ∈ [0, 1),
but others may be of interest in context, for example d = 0 against d ∈ (0, 1]
or d = 0.5 (borderline nonstationary) against d ∈ (0.5, 1]. Processes with d > 0
have long memory in the sense that the autocorrelations are not summable,
declining hyperbolically for 0 < d < 1, whilst a process with d ≥ 0.5 is also
nonstationary.
One way to approach the testing problem is to extend the familiar frame-
work of Dickey and Fuller, as suggested by Dolado, Gonzalo and Mayoral (2002)
and further extended by Lobato and Velasco (2006, 2007) and Dolado, Gon-
zalo and Mayoral (2