Time series Analysis using R 1
Time series Analysis using R
Module Code
Module Name
Course
Your Name
School
Date
Time series Analysis using R 2
Table of Contents
Table of Contents.......................................................................................................................2
Introduction................................................................................................................................3
Overview................................................................................................................................3
Exploratory Data Analysis.........................................................................................................3
Model fitting and Forecasting....................................................................................................5
Time Series.............................................................................................................................6
Time series with Bayesian approach....................................................................................10
Conclusions..............................................................................................................................14
References................................................................................................................................15
Appendix..................................................................................................................................16
R scripts................................................................................................................................16
Time series Analysis using R 3
Introduction
Overview
In economic terms, as a market, the housing market is usually determined by supply and
demand. However, although equilibrium can be reached in the market, this can lead to some
social problems such as housing unaffordability. In the particular case of London, England,
the last few decades have seen prices rise significantly: at the end of last a year lone, for the
first time in history, the average cost per property exceeded the £500,000 margin
(Antonakakis, 2018). This makes London the most expensive region in the UK to live in.
Referring again to the problem of housing unaffordability, it should be noted that the rate of
growth of housing prices has come to exceed the increase in individual incomes. In some
2014 countries, housing costs have even exceeded the average wage by a factor of several
times10, as opposed to when1997, they were only a factor of several times.
As a result of the pandemic caused by the COVID-19 virus, it has been reported that
during this period there has been a concentration of demand for housing in England. This has
several origins: In the case of first-time buyers, because they do not have the complications of
moving house, such as securing a target home, arranging arrangements and payments, and
contemplating repairs and refurbishments, there has been an incentive to buy properties,
which are normally at the lower end of house prices. This, in turn, has led to an increase in
demand and hence higher prices. On the other hand, another factor that has played a role in
the increase in prices is the change in people's housing preferences as a result of the strict
confinement measures decreed by the government at the end of March 2020
Exploratory Data Analysis
A database containing information on dwellings in London was used. The variables it
contains are date, area, average house price (recorded in £GBP), area code, number of houses
sold and an indicator as to whether the area is a London borough or not. The data are updated
every month from January to 1995 January and 2020 are considered as London regions 45,
Time series Analysis using R 4
therefore, each region has data301 per variable. For the purposes of this analysis, the study
analysed the time series of average house prices to make a prediction. In particular, the study
will work with the Westminster region. Then, for the average price variable for the chosen
region are presented in Table 1.
Table 1: Descriptive statistics of House Price (Mean)
Minimum Medium Max
Average price 121,387 521,837 1,117,408
The graphs in Figure 1, contains analysis of normality test the histograms of the
residuals, it is not observed in any anomaly that would lead the study to suspect that the
model is not treated in the same way as the residuals or possible suspect that it is not white
noise.
Figure 1: House price normality test
As expected, the minimum corresponds to the first observation (for instance, January
1995) and the maximum was reached in February 1995. 2018. As it is a time series, the study
will care more about the most recent data. For data from the study would have 2018 a mean
of then 991,349, the study would expect the predictions to be around this value.
Time series Analysis using R 5
Model fitting and Forecasting
In the classical time series approach, broadly speaking, the study has a series of
correlated values, due to their dependence on time (Laptev et al., 2017). Through the different
techniques, the study sought to analyse the information obtained, in order to identify a pattern
that allows us to describe this information over time. Subsequently, this pattern is extended
to a specific period of time in order to carry out a forecast. For time series from the Bayesian
approach, the study find that they have the same mathematical structure as the classical
approach, such as the Box-Jenkins model, but they differ in that in the estimation of the
parameters, these are considered random variables and as such have a probability space
associated with them (Laptev et al., 2017).
First, a Breusch-Pagan test is performed to verify the homoscedasticity of the model
of the observations. Based on the previous result, if heteroskedasticity exists, a Box-Cox
transformation will be applied to the data to correct for heteroskedasticity and again a
Breusch-Pagan test will be performed to check again this condition (Đalić &Terzić, 2021).
Subsequently, if homoscedasticity exists, a differencing will be used to confer stationarity to
the series, since this is necessary to make ARMA or ARIMA. Then, the Dickey-Fuller and
KPSS tests will be performed to check for stationarity in the series (Fedorová, 2016). Finally,
based on the previous results, an ARIMA or SARIMA adjustment is proposed and a
prediction is made for 5 future values.
For the Bayesian model, the following will be sought: Firstly, it should be noted that
the study will work with the already differentiated series that it is used with the classical
model (Xiao et al., 2017). Then, initial values on the parameters for the a priori distributions
will be defined; this will aim at estimating moving averages. The data burn-in will be
performed and the number of Markov chains for the estimation will be established.
According to Xiao et al. (2017), the convergence of the parameters will be checked
Time series Analysis using R 6
graphically and by means of a Gelman test. Finally, a prediction is made for 5future values.
To conclude the above procedure, the classical and Bayesian models are compared according
to the goodness-of-fit criteria, and it is determined which one offers a better model. Finally, a
new Bayesian model will be proposed in order to find a better estimation of the variance,
testing several models and comparing them in terms of their p-values and considering that
there is no correlation between their parameters.
Time Series
A time series is the succession of observations generated by a stochastic process, the
index of which is taken relative to time. In time series it is assumed that there is a correlation
structure between two observations, they are not independent (Woodward et al., 2017). The
study ran a time series for the price column for the Westminster region using R as observed
in Figure 2.
Figure 2: Time series price trend
For the time series, it was essential to have constant variance, so that the study can perform a
h-hosedastic test, under the Breusch- Pagan test, assuming:
H0 = The data are homoscedastic (have constant variance).
H1 = The data are heteroscedastic (the variance is not constant).
Time series Analysis using R 7
The analysis obtained a p-value < 0.05 so the data are not homoscedastic. Thus with
the BoxCox transformation the time series was made to be homoscedastic (constant
variance). The BoxCox command applies a transformation to the data according to a lambda
parameter that the study used to find a lambda value that generates a suitable transformation
to the data, we use the command BoxCox.lambda, then, the study finds lambda with
BoxCox.lambda and transform the data with BoxCox using the lambda parameter (Bauer et
al., 2019). All this under the Warrior method, which is simply the way lambda is going to be
calculated. It was anticipated to used the loglike method by changing "warrior" to "loglik",
however this would change the value of lambda and consequently the transformation, as
"warrior" worked well for us, it is left with this method.
Finally, the study re-ran the test and with a p-value = 0.2557, while the analysis also
accepted H0, for instance, the study data were homoscedastic. An analysis was then
performed stationarity tests, using two tests: the Dickey-Fuller test and the Kwiatkowski-
Phillips-Schmidt-Shin (KPSS) test. As the D-F test did not pass, the study performed a
differencing test and so it passed the stationarity test, as well as the KPSS test as presented in
Figure 3.
Time series Analysis using R 8
Figure 3: Decomposition of the time series.
Thereafter, the analysis used the sample ACF and PACF to give us an idea of the
number of lags so that the study could propose to fit with an ARIMA or SARIMA.
Theoretically an AR(p) will have the first few p lags of the PACF outside the confidence
bands and then the lags will quickly tend to zero, similarly an MA(q) will have the first p lags
of the PACF outside the confidence bands and then the lags will quickly tend to zero, hence it
is concluded that an ARMA(p,q) will fulfil both conditions are shown in Figure 4.
Figure 4: Differentiation of the time series
By means of auto.arima, the analysis obtained the model: ARIMA(2,0,3) WITH non-
zero MEAN with AIC = -1402.07 AICc = -1401. 68and BIC = -1376.14. Finally, the study
used forecast to obtain the prediction graph and the results are shown in Figure 5.
Time series Analysis using R 9
Figure 5: Projection of future values with confidence bands showing the full series
Figure 6: Projection of future values with confidence bands showing the series from the 2016
Time series with Bayesian approach
The study, thereafter attempted to fit the ARIMA(2,0,3) model, proposed in the
classical approach, using JAGS and the forecast package. For this, the study have to define
the equation to work, which requires two autoregressive variables (It will be called ρ 1and
ρ2) and 3latent variables (θ1, θ2 and θ3) for the moving averages, once the study had this, an
auxiliary variable z was defined to carry forward the moving averages. The following results
are shown in Figure 7 were obtained.
Time series Analysis using R 10
Figure 7: Tracing and density of the estimated parameters using strings.3
Time series Analysis using R 11
Figure 8: Tracing and density of the estimated parameters using strings.3
It is observed that the traces converge and the densities converge. Besides, it can be
seen that they converge very close to zero, which is what could also be seen in the classical
model. At the same time, it is evident that the behaviour of the residuals are normal, which is
also evident in Figure 9.
Time series Analysis using R 12
Figure 9: Residuals for the ARIMA (2,0,3) model using the Bayesian approach
Form Figure 9, it can be seen that if white noise is followed, the ACF and PACF plots remain
within the bands. The tests of the assumptions were performed and they did pass the test of
independence (Ljung-Box). From the result, if, it is observed at the following graph showing
the fitted values against the observed ones the study noticed that the variance is
underestimated since the fitted values are below the observed ones.se to the prediction
presented in the classical model and, can be stated that it fits good. Though, in case there
were any issues from the beginning with the variance of the series not being constant. It is
seen for another model to fit with the auto.sarima command, the results obtained an ARIMA
(1,0,2).
The study attempted to fit different models with the ggplot2 package. It was observed
that most of them did not pass the constant variance test, so the study tried to take the ones
with the highest p-value for this test and with no correlation in the parameters. The list of
models that were kept are: ARIMA(1,0,2), ARIMA(2,0,3), ARIMA(4,0,2), and ARIMA
Time series Analysis using R 13
(2,0,1), ARIMA(3,0,3). Finally, the results for the prediction of the model is obtained the
results are shown in Figure 10.
Figure 10: Model prediction (then the best model obtained was the ARIMA(1,0,2).
Comparing these models from the results in Figures above, it can be concluded that
that the one that seemed to have the best fit was the ARIMA(1,0,2), which was in fact the one
suggested by the auto.sarima command.
Conclusions
In the end there was no single best model for the time series fit. The study attempted
to work with the same model obtained in the classical approach for the Bayesian part but for
these methods it was not the best option. It could be seen that the importance of having
different ways of approaching the modelling and how difficult it can be to reach a good fit,
especially in the Bayesian way because it requires more computational work which may not
Time series Analysis using R 14
have been much with the study data but having databases with millions of data can
complicate trying different models. The R scripts used to obtain study results are annexed.
References
Antonakakis, N., 2018. Rethinking London's' ripple effect'on house prices: other UK regions
transmit shocks too. British Politics and Policy at LSE.
Bauer, A., Züfle, M., Herbst, N. and Kounev, S., 2019, June. Best practices for time series
forecasting (tutorial). In 2019 IEEE 4th International Workshops on Foundations and
Applications of Self* Systems (FAS* W) (pp. 255-256). IEEE.
Đalić, I. and Terzić, S., 2021. Violation of the assumption of homoscedasticity and detection
of heteroscedasticity. Decision Making: Applications in Management and
Engineering, 4(1), pp.1-18.
Database obtained from https://www.kaggle.com/justinas/housing-in-london
Fedorová, D., 2016. Selection of unit root test on the basis of length of the time series and
value of ar (1) parameter. Statistika, 96(3), p.3.
Laptev, N., Yosinski, J., Li, L.E. and Smyl, S., 2017, August. Time-series extreme event
forecasting with neural networks at uber. In International conference on machine
learning (Vol. 34, pp. 1-5). Sn
Woodward, W.A., Gray, H.L. and Elliott, A.C., 2017. Applied time series analysis with R.
CRC press.
Xiao, Q., Chaoqin, C. and Li, Z., 2017. Time series prediction using dynamic Bayesian
network. Optik, 135, pp.98-103.
Time series Analysis using R 15
Appendix
R scripts