0% found this document useful (0 votes)

79 views15 pages

Time Series Analysis Using R

This document discusses time series analysis of London housing prices using R. It begins with an exploratory data analysis of house prices in Westminster, finding prices ranged from £121,387 to £1,117,408 with an average of £521,837. Classical and Bayesian time series models are then fit to the data and compared. For the classical model, tests for stationarity and transformations are applied before fitting ARIMA/SARIMA models. For the Bayesian model, prior distributions are specified and parameters are estimated. The models are compared based on goodness of fit to select the best for forecasting future house prices.

Uploaded by

John Kalar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views15 pages

Time Series Analysis Using R

Uploaded by

John Kalar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Time series Analysis using R 1

Time series Analysis using R

Module Code
Module Name
Course
Your Name

School

Date
Time series Analysis using R 2

Table of Contents

Table of Contents.......................................................................................................................2

Introduction................................................................................................................................3

Overview................................................................................................................................3
Exploratory Data Analysis.........................................................................................................3

Model fitting and Forecasting....................................................................................................5

Time Series.............................................................................................................................6
Time series with Bayesian approach....................................................................................10
Conclusions..............................................................................................................................14

References................................................................................................................................15

Appendix..................................................................................................................................16

R scripts................................................................................................................................16
Time series Analysis using R 3

Introduction
Overview
In economic terms, as a market, the housing market is usually determined by supply and

demand. However, although equilibrium can be reached in the market, this can lead to some

social problems such as housing unaffordability. In the particular case of London, England,

the last few decades have seen prices rise significantly: at the end of last a year lone, for the

first time in history, the average cost per property exceeded the £500,000 margin

(Antonakakis, 2018). This makes London the most expensive region in the UK to live in.

Referring again to the problem of housing unaffordability, it should be noted that the rate of

growth of housing prices has come to exceed the increase in individual incomes. In some

2014 countries, housing costs have even exceeded the average wage by a factor of several

times10, as opposed to when1997, they were only a factor of several times.

As a result of the pandemic caused by the COVID-19 virus, it has been reported that

during this period there has been a concentration of demand for housing in England. This has

several origins: In the case of first-time buyers, because they do not have the complications of

moving house, such as securing a target home, arranging arrangements and payments, and

contemplating repairs and refurbishments, there has been an incentive to buy properties,

which are normally at the lower end of house prices. This, in turn, has led to an increase in

demand and hence higher prices. On the other hand, another factor that has played a role in

the increase in prices is the change in people's housing preferences as a result of the strict

confinement measures decreed by the government at the end of March 2020

Exploratory Data Analysis

A database containing information on dwellings in London was used. The variables it

contains are date, area, average house price (recorded in £GBP), area code, number of houses

sold and an indicator as to whether the area is a London borough or not. The data are updated

every month from January to 1995 January and 2020 are considered as London regions 45,
Time series Analysis using R 4

therefore, each region has data301 per variable. For the purposes of this analysis, the study

analysed the time series of average house prices to make a prediction. In particular, the study

will work with the Westminster region. Then, for the average price variable for the chosen

region are presented in Table 1.

Table 1: Descriptive statistics of House Price (Mean)

Minimum Medium Max

Average price 121,387 521,837 1,117,408

The graphs in Figure 1, contains analysis of normality test the histograms of the

residuals, it is not observed in any anomaly that would lead the study to suspect that the

model is not treated in the same way as the residuals or possible suspect that it is not white

noise.

Figure 1: House price normality test

As expected, the minimum corresponds to the first observation (for instance, January

1995) and the maximum was reached in February 1995. 2018. As it is a time series, the study

will care more about the most recent data. For data from the study would have 2018 a mean

of then 991,349, the study would expect the predictions to be around this value.
Time series Analysis using R 5

Model fitting and Forecasting

In the classical time series approach, broadly speaking, the study has a series of

correlated values, due to their dependence on time (Laptev et al., 2017). Through the different

techniques, the study sought to analyse the information obtained, in order to identify a pattern

that allows us to describe this information over time. Subsequently, this pattern is extended

to a specific period of time in order to carry out a forecast. For time series from the Bayesian

approach, the study find that they have the same mathematical structure as the classical

approach, such as the Box-Jenkins model, but they differ in that in the estimation of the

parameters, these are considered random variables and as such have a probability space

associated with them (Laptev et al., 2017).

First, a Breusch-Pagan test is performed to verify the homoscedasticity of the model

of the observations. Based on the previous result, if heteroskedasticity exists, a Box-Cox

transformation will be applied to the data to correct for heteroskedasticity and again a

Breusch-Pagan test will be performed to check again this condition (Đalić &Terzić, 2021).

Subsequently, if homoscedasticity exists, a differencing will be used to confer stationarity to

the series, since this is necessary to make ARMA or ARIMA. Then, the Dickey-Fuller and

KPSS tests will be performed to check for stationarity in the series (Fedorová, 2016). Finally,

based on the previous results, an ARIMA or SARIMA adjustment is proposed and a

prediction is made for 5 future values.

For the Bayesian model, the following will be sought: Firstly, it should be noted that

the study will work with the already differentiated series that it is used with the classical

model (Xiao et al., 2017). Then, initial values on the parameters for the a priori distributions

will be defined; this will aim at estimating moving averages. The data burn-in will be

performed and the number of Markov chains for the estimation will be established.

According to Xiao et al. (2017), the convergence of the parameters will be checked
Time series Analysis using R 6

graphically and by means of a Gelman test. Finally, a prediction is made for 5future values.

To conclude the above procedure, the classical and Bayesian models are compared according

to the goodness-of-fit criteria, and it is determined which one offers a better model. Finally, a

new Bayesian model will be proposed in order to find a better estimation of the variance,

testing several models and comparing them in terms of their p-values and considering that

there is no correlation between their parameters.

Time Series
A time series is the succession of observations generated by a stochastic process, the

index of which is taken relative to time. In time series it is assumed that there is a correlation

structure between two observations, they are not independent (Woodward et al., 2017). The

study ran a time series for the price column for the Westminster region using R as observed

in Figure 2.

Figure 2: Time series price trend

For the time series, it was essential to have constant variance, so that the study can perform a

h-hosedastic test, under the Breusch- Pagan test, assuming:

H0 = The data are homoscedastic (have constant variance).

H1 = The data are heteroscedastic (the variance is not constant).
Time series Analysis using R 7

The analysis obtained a p-value < 0.05 so the data are not homoscedastic. Thus with

the BoxCox transformation the time series was made to be homoscedastic (constant

variance). The BoxCox command applies a transformation to the data according to a lambda

parameter that the study used to find a lambda value that generates a suitable transformation

to the data, we use the command BoxCox.lambda, then, the study finds lambda with

BoxCox.lambda and transform the data with BoxCox using the lambda parameter (Bauer et

al., 2019). All this under the Warrior method, which is simply the way lambda is going to be

calculated. It was anticipated to used the loglike method by changing "warrior" to "loglik",

however this would change the value of lambda and consequently the transformation, as

"warrior" worked well for us, it is left with this method.

Finally, the study re-ran the test and with a p-value = 0.2557, while the analysis also

accepted H0, for instance, the study data were homoscedastic. An analysis was then

performed stationarity tests, using two tests: the Dickey-Fuller test and the Kwiatkowski-

Phillips-Schmidt-Shin (KPSS) test. As the D-F test did not pass, the study performed a

differencing test and so it passed the stationarity test, as well as the KPSS test as presented in

Figure 3.
Time series Analysis using R 8

Figure 3: Decomposition of the time series.

Thereafter, the analysis used the sample ACF and PACF to give us an idea of the

number of lags so that the study could propose to fit with an ARIMA or SARIMA.

Theoretically an AR(p) will have the first few p lags of the PACF outside the confidence

bands and then the lags will quickly tend to zero, similarly an MA(q) will have the first p lags

of the PACF outside the confidence bands and then the lags will quickly tend to zero, hence it

is concluded that an ARMA(p,q) will fulfil both conditions are shown in Figure 4.

Figure 4: Differentiation of the time series

By means of auto.arima, the analysis obtained the model: ARIMA(2,0,3) WITH non-

zero MEAN with AIC = -1402.07 AICc = -1401. 68and BIC = -1376.14. Finally, the study

used forecast to obtain the prediction graph and the results are shown in Figure 5.
Time series Analysis using R 9

Figure 5: Projection of future values with confidence bands showing the full series

Figure 6: Projection of future values with confidence bands showing the series from the 2016

Time series with Bayesian approach

The study, thereafter attempted to fit the ARIMA(2,0,3) model, proposed in the

classical approach, using JAGS and the forecast package. For this, the study have to define

the equation to work, which requires two autoregressive variables (It will be called ρ 1and

ρ2) and 3latent variables (θ1, θ2 and θ3) for the moving averages, once the study had this, an

auxiliary variable z was defined to carry forward the moving averages. The following results

are shown in Figure 7 were obtained.

Time series Analysis using R 10

Figure 7: Tracing and density of the estimated parameters using strings.3

Time series Analysis using R 11

Figure 8: Tracing and density of the estimated parameters using strings.3

It is observed that the traces converge and the densities converge. Besides, it can be

seen that they converge very close to zero, which is what could also be seen in the classical

model. At the same time, it is evident that the behaviour of the residuals are normal, which is

also evident in Figure 9.

Time series Analysis using R 12

Figure 9: Residuals for the ARIMA (2,0,3) model using the Bayesian approach

Form Figure 9, it can be seen that if white noise is followed, the ACF and PACF plots remain

within the bands. The tests of the assumptions were performed and they did pass the test of

independence (Ljung-Box). From the result, if, it is observed at the following graph showing

the fitted values against the observed ones the study noticed that the variance is

underestimated since the fitted values are below the observed ones.se to the prediction

presented in the classical model and, can be stated that it fits good. Though, in case there

were any issues from the beginning with the variance of the series not being constant. It is

seen for another model to fit with the auto.sarima command, the results obtained an ARIMA

(1,0,2).

The study attempted to fit different models with the ggplot2 package. It was observed

that most of them did not pass the constant variance test, so the study tried to take the ones

with the highest p-value for this test and with no correlation in the parameters. The list of

models that were kept are: ARIMA(1,0,2), ARIMA(2,0,3), ARIMA(4,0,2), and ARIMA
Time series Analysis using R 13

(2,0,1), ARIMA(3,0,3). Finally, the results for the prediction of the model is obtained the

results are shown in Figure 10.

Figure 10: Model prediction (then the best model obtained was the ARIMA(1,0,2).

Comparing these models from the results in Figures above, it can be concluded that

that the one that seemed to have the best fit was the ARIMA(1,0,2), which was in fact the one

suggested by the auto.sarima command.

Conclusions
In the end there was no single best model for the time series fit. The study attempted

to work with the same model obtained in the classical approach for the Bayesian part but for

these methods it was not the best option. It could be seen that the importance of having

different ways of approaching the modelling and how difficult it can be to reach a good fit,

especially in the Bayesian way because it requires more computational work which may not
Time series Analysis using R 14

have been much with the study data but having databases with millions of data can

complicate trying different models. The R scripts used to obtain study results are annexed.

References
Antonakakis, N., 2018. Rethinking London's' ripple effect'on house prices: other UK regions
transmit shocks too. British Politics and Policy at LSE.

Bauer, A., Züfle, M., Herbst, N. and Kounev, S., 2019, June. Best practices for time series
forecasting (tutorial). In 2019 IEEE 4th International Workshops on Foundations and
Applications of Self* Systems (FAS* W) (pp. 255-256). IEEE.

Đalić, I. and Terzić, S., 2021. Violation of the assumption of homoscedasticity and detection
of heteroscedasticity. Decision Making: Applications in Management and
Engineering, 4(1), pp.1-18.

Database obtained from https://www.kaggle.com/justinas/housing-in-london

Fedorová, D., 2016. Selection of unit root test on the basis of length of the time series and
value of ar (1) parameter. Statistika, 96(3), p.3.

Laptev, N., Yosinski, J., Li, L.E. and Smyl, S., 2017, August. Time-series extreme event
forecasting with neural networks at uber. In International conference on machine
learning (Vol. 34, pp. 1-5). Sn

Woodward, W.A., Gray, H.L. and Elliott, A.C., 2017. Applied time series analysis with R.
CRC press.

Xiao, Q., Chaoqin, C. and Li, Z., 2017. Time series prediction using dynamic Bayesian
network. Optik, 135, pp.98-103.
Time series Analysis using R 15

Appendix
R scripts

Time Series Forecasting Guide
100% (1)
Time Series Forecasting Guide
61 pages
Time Series Updated
No ratings yet
Time Series Updated
25 pages
The Analysis of Time Series An Introduction by Chris Chatfield 5th Edition 11 18 PDF
No ratings yet
The Analysis of Time Series An Introduction by Chris Chatfield 5th Edition 11 18 PDF
8 pages
Time Series Analysis
No ratings yet
Time Series Analysis
21 pages
Time Series Modeling Basics
No ratings yet
Time Series Modeling Basics
4 pages
Times Series 1
No ratings yet
Times Series 1
88 pages
Time Series Analysis
No ratings yet
Time Series Analysis
10 pages
07 Time - Series - Analysis - With - R - Ranjeet Paul
No ratings yet
07 Time - Series - Analysis - With - R - Ranjeet Paul
10 pages
Time Series Analysis
No ratings yet
Time Series Analysis
36 pages
4.2 Empirical Analysis: 4.2.1 Descriptive Statistics
No ratings yet
4.2 Empirical Analysis: 4.2.1 Descriptive Statistics
12 pages
Hannan E.J., Krishnaiah P.R., Rao M.M.-Handbook of Statistics, Vol. 5. Time Series in The Time Domain (1985) PDF
No ratings yet
Hannan E.J., Krishnaiah P.R., Rao M.M.-Handbook of Statistics, Vol. 5. Time Series in The Time Domain (1985) PDF
482 pages
Unit III Time Series Analysis Lesson 6
No ratings yet
Unit III Time Series Analysis Lesson 6
22 pages
Chapter 05 Exploratory
No ratings yet
Chapter 05 Exploratory
26 pages
Time Series Analysis
No ratings yet
Time Series Analysis
2 pages
Time Series Analysis in R Guide
No ratings yet
Time Series Analysis in R Guide
19 pages
Econ 2 - Time Series
No ratings yet
Econ 2 - Time Series
23 pages
Seasonal Modelling of Fourier Series With Linear Trend
No ratings yet
Seasonal Modelling of Fourier Series With Linear Trend
8 pages
Time Series Analysis Syllabus
No ratings yet
Time Series Analysis Syllabus
84 pages
BBS en 2010 1 Piscopo
No ratings yet
BBS en 2010 1 Piscopo
8 pages
DAV Module 3
No ratings yet
DAV Module 3
19 pages
Time Series Analysis
No ratings yet
Time Series Analysis
9 pages
Arima Garch 11 Modelling and Forecasting For A Ge Stock Price Using R
No ratings yet
Arima Garch 11 Modelling and Forecasting For A Ge Stock Price Using R
20 pages
End Term Project (BA)
No ratings yet
End Term Project (BA)
19 pages
Descriptive Statistics in Drug Response Analysis
No ratings yet
Descriptive Statistics in Drug Response Analysis
4 pages
Dafadsg S
No ratings yet
Dafadsg S
12 pages
Introduction To Time Series
No ratings yet
Introduction To Time Series
6 pages
Characteristics of Time Series
No ratings yet
Characteristics of Time Series
2 pages
Topic 4 Analysis of Time Series
No ratings yet
Topic 4 Analysis of Time Series
38 pages
Chapter 3 Time Series Analysis
No ratings yet
Chapter 3 Time Series Analysis
28 pages
7 Time Series
No ratings yet
7 Time Series
50 pages
TIME SERIES ANALYSIS Chapter 1 and 2
No ratings yet
TIME SERIES ANALYSIS Chapter 1 and 2
24 pages
Statistical Analysis of Time Series Data
No ratings yet
Statistical Analysis of Time Series Data
6 pages
Assigment # 1 For Economatrics - 102649
No ratings yet
Assigment # 1 For Economatrics - 102649
10 pages
Bda M5 B.com2
No ratings yet
Bda M5 B.com2
25 pages
Chapter 6 Time Series Analysis
No ratings yet
Chapter 6 Time Series Analysis
31 pages
Chapter 13
No ratings yet
Chapter 13
20 pages
Advanced Time Series Forecasting Techniques
No ratings yet
Advanced Time Series Forecasting Techniques
12 pages
Ch-Five Econometrics Normal
No ratings yet
Ch-Five Econometrics Normal
11 pages
Timeseries Intro
No ratings yet
Timeseries Intro
44 pages
Unit 4 Data Analytics
No ratings yet
Unit 4 Data Analytics
13 pages
Time Series Analysis Basics
No ratings yet
Time Series Analysis Basics
10 pages
Time Series
No ratings yet
Time Series
1 page
Time Series and Forecasting
No ratings yet
Time Series and Forecasting
75 pages
Lütkepohl & Krätzig 2004 Applied Time Series Econometrics
No ratings yet
Lütkepohl & Krätzig 2004 Applied Time Series Econometrics
350 pages
7 Applied Time Series Econometrics PETER C.B. PHILLIPS
No ratings yet
7 Applied Time Series Econometrics PETER C.B. PHILLIPS
350 pages
Time Series Notes
No ratings yet
Time Series Notes
26 pages
The Use of The Variogram in Time Series Analysis
No ratings yet
The Use of The Variogram in Time Series Analysis
50 pages
Times Series Analysis Notes
No ratings yet
Times Series Analysis Notes
5 pages
Unit 2
No ratings yet
Unit 2
23 pages
Models: Autoregressive Moving Average
No ratings yet
Models: Autoregressive Moving Average
13 pages
ARMA Models: Introduction & History
No ratings yet
ARMA Models: Introduction & History
23 pages
STA 114 Exam Qs and Sol May 2015
No ratings yet
STA 114 Exam Qs and Sol May 2015
5 pages
Hanke9 Odd-Num Sol 03
100% (1)
Hanke9 Odd-Num Sol 03
10 pages
Datsol Solutions: Statistical Insights
No ratings yet
Datsol Solutions: Statistical Insights
32 pages
Thesis Adviser Review Guide
No ratings yet
Thesis Adviser Review Guide
8 pages
Uji Mancova: Data Mentah
No ratings yet
Uji Mancova: Data Mentah
4 pages
Single Sample Hypothesis Testing
No ratings yet
Single Sample Hypothesis Testing
14 pages
Econometrics-Final Exam BFI-61th Code 1
100% (1)
Econometrics-Final Exam BFI-61th Code 1
4 pages
QAM 2 - PGP 36 - KB
No ratings yet
QAM 2 - PGP 36 - KB
5 pages
Hypothesis Testing Basics
No ratings yet
Hypothesis Testing Basics
41 pages
Understanding the Binomial Test
No ratings yet
Understanding the Binomial Test
10 pages
Car Fuel Efficiency Prediction
No ratings yet
Car Fuel Efficiency Prediction
5 pages
When to Use Descriptive Statistics
No ratings yet
When to Use Descriptive Statistics
22 pages
Module 3 - Regression
No ratings yet
Module 3 - Regression
55 pages
University of The East: College of Engineering Computer Engineering Department Caloocan
No ratings yet
University of The East: College of Engineering Computer Engineering Department Caloocan
10 pages
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual PDF Version
100% (4)
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual PDF Version
74 pages
Soap Data
No ratings yet
Soap Data
11 pages
Problem Set 1: Deadline
No ratings yet
Problem Set 1: Deadline
12 pages
Unit Root Test and Applications
No ratings yet
Unit Root Test and Applications
11 pages
Teaching Statistics To Engineers
No ratings yet
Teaching Statistics To Engineers
11 pages
Dynamic Econometric Models Time Series Econometrics For Microeconometricians 2011
No ratings yet
Dynamic Econometric Models Time Series Econometrics For Microeconometricians 2011
51 pages
Solutions Chapter6
100% (1)
Solutions Chapter6
19 pages
CUHK STAT3004 Assignment 3
No ratings yet
CUHK STAT3004 Assignment 3
2 pages
Lind 18e Chap009 PPT
No ratings yet
Lind 18e Chap009 PPT
27 pages
Math Lesson Plans for Teachers
No ratings yet
Math Lesson Plans for Teachers
8 pages
Panel Data Analysis Guide
No ratings yet
Panel Data Analysis Guide
19 pages
ANOVA Assumptions
No ratings yet
ANOVA Assumptions
25 pages
Chapter 12 T - Test, F Test
No ratings yet
Chapter 12 T - Test, F Test
38 pages
MCD2080-ETC1000-ETF1100 Exam S1 2018
No ratings yet
MCD2080-ETC1000-ETF1100 Exam S1 2018
10 pages
Simple Linear Regression Model Explained
No ratings yet
Simple Linear Regression Model Explained
68 pages
Understanding Pearson's R Coefficient
No ratings yet
Understanding Pearson's R Coefficient
48 pages
Unit4 Multivariate Analysis
No ratings yet
Unit4 Multivariate Analysis
20 pages
Econometrics I Course Overview
No ratings yet
Econometrics I Course Overview
4 pages
Chapter 3
No ratings yet
Chapter 3
23 pages