How To Make The Best Use Of Live Sessions
• Please log in 10 mins before the class starts and check your internet connection to avoid any network issues during the LIVE
session
• All participants will be on mute, by default, to avoid any background noise. However, you will be unmuted by instructor if
required. Please use the “Questions” tab on your webinar tool to interact with the instructor at any point during the class
• Feel free to ask and answer questions to make your learning interactive. Instructor will address your queries at the end of
on-going topic
• Raise a ticket through your LMS in case of any queries. Our dedicated support team is available 24 x 7 for your assistance
• Your feedback is very much appreciated. Please share feedback after each class, which will help us enhance your learning
experience
Copyright © edureka and/or its affiliates. All rights reserved.
Course Outline
Introduction to Python Dimensionality Reduction
Sequences and File Operations Supervised Learning - II
Deep Dive-Functions, OOPS,
Modules, Errors and Exceptions Unsupervised Learning
Introduction to Numpy, Pandas Association Rules Mining and
and Matplotlib Recommendation Systems
Data Manipulation Reinforcement Learning
Introduction to Machine Learning
with Python Time Series Analysis
Supervised Learning - I Model Selection and Boosting
Copyright © edureka and/or its affiliates. All rights reserved.
Time Series Analysis
Copyright © edureka and/or its affiliates. All rights reserved.
Topics
The topics covered in this module are:
▪ Time Series Analysis
▪ Components of Time Series Analysis
▪ White Noise
▪ Dickey Fuller Test
▪ ACF vs PACF
Copyright © edureka and/or its affiliates. All rights reserved.
Objectives
After completing this module, you should be able to:
▪ Explain Time Series Analysis (TSA)
▪ Define components of Time Series Analysis
▪ Understand white noise and how it cause irregularities
▪ Define stationary variables
▪ Convert a non stationary variables to stationary
▪ Do Dickey Fuller Test
▪ Differentiate between ACF vs PACF
Copyright © edureka and/or its affiliates. All rights reserved.
What is Time Series Analysis?
Copyright © edureka and/or its affiliates. All rights reserved.
Definition
A Time Series can be defined as a set of data
dependent on time
Time acts as an independent variable to estimate
dependent variables
Mathematically, a time series is a set of
observation taken at specified times (usually at
equal intervals)
A time series defined by the values Y1, Y2.. of a
variable Y at times t1, t2…. is given by: Y = F (t)
Time dependent sales data
Copyright © edureka and/or its affiliates. All rights reserved.
Importance of Time Series Analysis (TSA)
Business forecasting
Understanding past behavior
Planning for future operations
Evaluate current accomplishments
Copyright © edureka and/or its affiliates. All rights reserved.
Components of Time Series Analysis
Trend Seasonality
Irregular Cyclical
Patterns
Copyright © edureka and/or its affiliates. All rights reserved.
Trend
• A gradual shift or movement to relatively higher or lower values
over a long period of time
• When the time series analysis shows a general pattern, that is
upward, we call it uptrend
Trend Seasonality
• When the trend pattern exhibits a general pattern, that is down,
Irregular Cyclic we call it a downtrend
Patterns
• If there were no trend, we call it horizontal trend or stationary
trend
Copyright © edureka and/or its affiliates. All rights reserved.
Seasonality
• Upward or downward swings
• Repeating Pattern within a fixed time period
Trend Seasonality
• Usually observed within one year
Irregularity Cyclic
Patterns
• Eg: If you live in a country with cold winters and hot summers, your
air conditioning costs goes high in summers and low in winters
Copyright © edureka and/or its affiliates. All rights reserved.
Cyclical Patterns
• Repeating up & down movements
• Usually go over more than a year of time
Trend Seasonality
• Don’t have a fixed period
Irregular Cyclical
Patterns
• Much harder to predict
Copyright © edureka and/or its affiliates. All rights reserved.
Irregular
• Erratic, unsystematic, ‘residual’ fluctuations
• Short duration & nonrepeating
Trend Seasonality
• Due to random variation or unforeseen events
Irregular Cyclical
Patterns
• Presence of white noise
Copyright © edureka and/or its affiliates. All rights reserved.
Let’s understand
what is white noise
and why it accounts
for Irregularity?
Copyright © edureka and/or its affiliates. All rights reserved.
White Noise
Describes the assumption that each element in a series is a
random draw from a population
• Zero mean and constant variance
• Autoregressive (AR) and Moving Average
(MA) models correct for violations of this
white noise assumption
Copyright © edureka and/or its affiliates. All rights reserved.
AR Model
01
Value of a variable in one period is related to
its values in previous periods
AR(p) is an autoregressive model with p lags:
𝑝
𝑦𝑡 = 𝜇 + σ𝑖=1 𝛾𝑖 𝑦𝑡−𝑖 + 𝜖𝑡 , where 𝜇 is a constant,
02 𝛾𝑝 is the coefficient for the lagged variable, 𝑦𝑡
is the dependent variable at time t, 𝑦𝑡−𝑖 is the
independent variable at previous time period
and 𝜖𝑡 is the error at time t
AR(1) is expressed as:
03
𝑦𝑡 = 𝜇 + 𝛾𝑦𝑡−1 + 𝜖𝑡 = 𝜇 + 𝛾(𝐿𝑦𝑡 ) + 𝜖𝑡 or (1-𝛾L)𝑦𝑡 = 𝜇 + 𝜖𝑡
Copyright © edureka and/or its affiliates. All rights reserved.
AR Model – Example
Consider you have a value of 𝑦𝑡 = 1 then the value in next period:
AR(1) with 𝛾 = 0.8 AR(1) with 𝛾 = - 0.8
0.8% of the value of current
period is carried in the next
consecutive periods
Copyright © edureka and/or its affiliates. All rights reserved.
AR Model – Example
Consider you have a value of 𝑦𝑡 = 1 then the value in next period:
AR(1) with 𝛾 = 0.8 AR(1) with 𝛾 = - 0.8
0.8% of the value of current - 0.8% of the value of current
period is carried in the next period is carried in the next
consecutive periods consecutive periods. Hence,
oscillating curve
Copyright © edureka and/or its affiliates. All rights reserved.
MA Model
Accounts for the possibility of a
relationship between a variable and
the residuals from previous periods
3 2
Copyright © edureka and/or its affiliates. All rights reserved.
MA Model
Accounts for the possibility of a
relationship between a variable and
the residuals from previous periods
MA(q) is a moving average
model with q lags:
3 2 𝑞
𝑦𝑡 = 𝜇 + 𝜖𝑡 + σ𝑖=1 𝜃𝑖 𝜖𝑡−𝑖 , where
𝜃𝑞 is the coefficient for the
lagged error term in time t-q
Copyright © edureka and/or its affiliates. All rights reserved.
MA Model
Accounts for the possibility of a
relationship between a variable and
the residuals from previous periods
MA(q) is a moving average
model with q lags:
MA(1) model is expressed as: 3 2 𝑞
𝑦𝑡 = 𝜇 + 𝜖𝑡 + 𝜃1 𝜖𝑡−1 𝑦𝑡 = 𝜇 + 𝜖𝑡 + σ𝑖=1 𝜃𝑖 𝜖𝑡−𝑖 , where
𝜃𝑞 is the coefficient for the
lagged error term in time t-q
Copyright © edureka and/or its affiliates. All rights reserved.
MA Model – Example
Consider 𝜖 = 1 in the previous period, then the error value retained:
MA(1) with 𝜃 = 0.7 MA(1) with 𝜃 = -0.7
0.7% of the value of previous
residual value is carried in the
consecutive periods
Copyright © edureka and/or its affiliates. All rights reserved.
MA Model – Example
Consider 𝜖 = 1 in the previous period, then the error value retained:
MA(1) with 𝜃 = 0.7 MA(1) with 𝜃 = -0.7
0.7% of the value of previous -0.7% of the value of previous
residual value is carried in the residual value is carried in the
consecutive periods consecutive periods
Copyright © edureka and/or its affiliates. All rights reserved.
Sometimes using AR and
MA models for prediction
is not enough due to lack
of accuracy. This brought
ARMA model into
existence
Copyright © edureka and/or its affiliates. All rights reserved.
ARMA Model
=
Copyright © edureka and/or its affiliates. All rights reserved.
ARMA Model
Combine both p autoregressive terms
2 and q moving average terms. Hence,
called ARMA (p,q)
𝑝 𝑞
𝑦𝑡 = 𝜇 + σ𝑖=1 𝛾𝑖 𝑦𝑡−𝑖 + 𝜖𝑡 + σ𝑖=1 𝜃𝑖 𝜖𝑡−𝑖
Combination of
both AR and MA 1
models
Auto Moving
Regressive Average
part part
Copyright © edureka and/or its affiliates. All rights reserved.
ARMA Model – Example
Consider you have a value of 𝑦𝑡 = 1 and 𝜖 = 1 in the current period then the value in next period:
ARMA(1,1) with 𝛾 = 0.8 and 𝜃 = 0.7 ARMA(1,1) with 𝛾 = -0.8 and 𝜃 = -0.7
Copyright © edureka and/or its affiliates. All rights reserved.
ARMA model is valid
only if the variables
are stationary. Let’s
discuss stationarity in
detail
Copyright © edureka and/or its affiliates. All rights reserved.
Stationarity
Description
Modelling an ARMA
(p,q) process requires
stationarity
Copyright © edureka and/or its affiliates. All rights reserved.
Stationarity
Description
Description
1 2
A stationary plot
Modelling an ARMA A process with mean
(p,q) process requires and variance that do
stationarity not change over time
and do not have trends
Copyright © edureka and/or its affiliates. All rights reserved.
Stationarity
Description
Description
Description
1 2 3
Modelling an ARMA A process with mean An AR(1) disturbance
(p,q) process requires and variance that do process:
stationarity not change over time 𝜇𝑡 = 𝜌 𝜇𝑡−1 + 𝜖𝑡
and do not have trends
Copyright © edureka and/or its affiliates. All rights reserved.
Stationarity
Description
Description
Description
Description
1 2 3 4
Modelling an ARMA A process with mean An AR(1) disturbance Stationary if |𝜌| < 1
(p,q) process requires and variance that do process: and 𝜖𝑡 is white noise
stationarity not change over time 𝜇𝑡 = 𝜌 𝜇𝑡−1 + 𝜖𝑡
and do not have trends
Copyright © edureka and/or its affiliates. All rights reserved.
Stationarity – An Example
Consider an example of a time series variable:
Copyright © edureka and/or its affiliates. All rights reserved.
Stationarity – An Example
Consider an example of a time series variable:
Here, the time series variable
is not stationary as there is an
increasing trend and it’s
oscillating over time
Copyright © edureka and/or its affiliates. All rights reserved.
Let’s check, how to
convert non
stationary variables
to stationary for
effective time series
modelling
Copyright © edureka and/or its affiliates. All rights reserved.
Approaches to Remove Non-Stationarity
02
OPTION
Detrending Differencing
01
OPTION
Copyright © edureka and/or its affiliates. All rights reserved.
Detrending
Variable 𝑦𝑡
02
OPTION
Detrending
A variable can be
detrended by regressing
the variable on a time
trend and obtaining the
residuals:
01
OPTION
𝑦𝑡 = 𝜇 + 𝛽t + 𝜖𝑡
Detrended variable 𝜀𝑡Ƹ = 𝑦𝑡 - 𝜇Ƹ + 𝛽t
Copyright © edureka and/or its affiliates. All rights reserved.
Differencing
02
OPTION
Detrending Differencing
A variable can be Uses the concept of
detrended by regressing differenced variable:
the variable on a time ∆𝑦𝑡 = 𝑦𝑡 - 𝑦𝑡−1 , for first
trend and obtaining the
residuals:
01 order differences
OPTION The variable 𝑦𝑡 is
𝑦𝑡 = 𝜇 + 𝛽t + 𝜖𝑡 integrated of order one,
denoted I(1), if taking a
first difference, producing a
stationary process
Copyright © edureka and/or its affiliates. All rights reserved.
ARIMA (p,d,q) denotes an ARMA model with
p autoregressive lags, q moving average lags
and difference in the order of d
Copyright © edureka and/or its affiliates. All rights reserved.
While making the time series
stationary, there may be a
possibility that it’s not been
stationarized yet. This can be
checked using Dickey Fuller Test
and can be stationarized using
higher order differencing
Copyright © edureka and/or its affiliates. All rights reserved.
Dickey Fuller Test for Stationarity
Assume an AR(1) model. You can estimate the above model
The model is non-stationary or a unit for stationarity by testing the
root is present if |𝜌| = 1 significance of the 𝛾 coefficient:
𝑦𝑡 = 𝜌𝑦𝑡−1 + 𝜖𝑡
𝑦𝑡 - 𝑦𝑡−1 = 𝜌𝑦𝑡−1 - 𝑦𝑡−1 + 𝜖𝑡 ▪ If the null hypothesis is not
∆𝑦𝑡 = (𝜌 – 1)𝑦𝑡−1 + 𝜖𝑡 = 𝛾𝑦𝑡−1 + 𝜖𝑡 rejected, 𝛾 ∗ = 0, then 𝑦𝑡 is not
stationary
▪ Difference the variable and
repeat the test to see if the
differenced variable is stationary
▪ If the null hypothesis is rejected,
𝛾 ∗ > 0, then 𝑦𝑡 is stationary
Copyright © edureka and/or its affiliates. All rights reserved.
Now that you have
understood the concept of
stationarity. Let’s
understand ACF and PACF
in order to determine lags
to start with time series
modelling
Copyright © edureka and/or its affiliates. All rights reserved.
ACF (Auto Correlation Function)
ACF is the proportion of the covariance of 𝑦𝑡 and 𝑦𝑡−𝑘
to the variance of a dependent variable 𝑦𝑡 :
1
Cov(𝑦𝑡 , 𝑦𝑡−𝑘 )
ACF(k) = 𝜌𝑘 =
Var(𝑦𝑡 )
2 Gives the gross correlation between 𝑦𝑡 and 𝑦𝑡−𝑘
3 For an AR(1) model, the ACF(k) = 𝜌𝑘 = 𝛾 𝑘
Copyright © edureka and/or its affiliates. All rights reserved.
PACF (Partial Auto Correlation Function)
Simple correlation between 𝑦𝑡 For an AR(1) model, the PACF
and 𝑦𝑡−𝑘 minus the part
02 is 𝛾 for the first lag
explained by the intervening lags
𝜌𝑘∗ = Corr| 𝑦𝑡 - E*(𝑦𝑡 | 𝑦𝑡−1 ,…,
𝑦𝑡−𝑘+1 ), 𝑦𝑡−𝑘 )| 01
Where E*(𝑦𝑡 | 𝑦𝑡−1 ,…, 𝑦𝑡−𝑘+1 ) is
the minimum mean squared
predictor of 𝑦𝑡 by 𝑦𝑡−1 ,…, 𝑦𝑡−𝑘+1
Copyright © edureka and/or its affiliates. All rights reserved.
ACF vs PACF Plot of AR(1) Model
Consider the ACF and PACF plots of AR(1) model with coefficient being 0.8
ACF PACF
AR function Tails off gradually No correlation between 𝑦𝑡 and 𝑦𝑡−𝑘
and hence, cuts off after the first lag
Copyright © edureka and/or its affiliates. All rights reserved.
ACF vs PACF Plot of MA(1) Model
Consider the ACF and PACF plots of MA(1) model with coefficient being 0.7
ACF PACF
MA function cuts off after the first MA function tailing off
lag as there nothing more significant
after first lag
Copyright © edureka and/or its affiliates. All rights reserved.
A Case Study and its Solution Using
Time Series Analysis in Python
Copyright © edureka and/or its affiliates. All rights reserved.
Scenario
Below is the dataset containing the data from Quandyl. The data describes the Bank of England’s official
statistics on spot exchange rates for the Euro into US dollars. A screenshot of the dataset is provided below:
Date Value
09-11-2017 0.8603
08-11-2017 0.8631
07-11-2017 0.8639
06-11-2017 0.8631
03-11-2017 0.8608
02-11-2017 0.8567
01-11-2017 0.8608
31-10-2017 0.8584
30-10-2017 0.8601
Task: Analyse the future rates
Copyright © edureka and/or its affiliates. All rights reserved.
Installing Libraries
Install the following libraries which will be used as a part of the forecasting process:
import requests, pandas as pd, numpy as np
from pandas import DataFrame
from io import StringIO
import time, json
from datetime import date
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.seasonal import seasonal_decompose
from sklearn.metrics import mean_squared_error
import matplotlib.pylab as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 6
Copyright © edureka and/or its affiliates. All rights reserved.
Loading the Data
Load the BOE-XUDLERD.csv file
df_fx_data = pd.read_csv('BOE-XUDLERD.csv')
df_fx_data
Copyright © edureka and/or its affiliates. All rights reserved.
Converting to Time Series Data
Convert the Pandas DataFrame into a time series with daily frequency and show the first 5 rows:
df_fx_data['Date'] = pd.to_datetime(df_fx_data['Date'])
indexed_df = df_fx_data.set_index('Date’)
ts = indexed_df['Value']
ts.head(5)
Copyright © edureka and/or its affiliates. All rights reserved.
Visualize the Raw Data
Visualize the time series to see how the Euro is trending against the US dollar over time
plt.plot(ts)
Copyright © edureka and/or its affiliates. All rights reserved.
Resample the Data
Using daily data for your time series contains too much variation, so you must first resample the time series data
by week. Then use this resampled time series to predict the Euro exchange rates against the US Dollar
ts_week = ts.resample('W').mean()
plt.plot(ts_week)
Copyright © edureka and/or its affiliates. All rights reserved.
Check for Stationarity
Plot the moving variance and observe if it remains constant over time. However, you might not always be able to
make such visual inferences. Hence, apply the Dickey Fuller Test to check for stationarity
Copyright © edureka and/or its affiliates. All rights reserved.
Check for Stationarity (Contd…)
def test_stationarity(timeseries):
#Determing rolling statistics
rolmean = timeseries.rolling(window=52,center=False).mean()
rolstd = timeseries.rolling(window=52,center=False).std()
#Plot rolling statistics:
orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)
#Perform Dickey-Fuller test:
print('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags
Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print(dfoutput)
test_stationarity(ts_week)
Copyright © edureka and/or its affiliates. All rights reserved.
Check for Stationarity – O/P
Copyright © edureka and/or its affiliates. All rights reserved.
Inference Drawn Out of Dickey Fuller Test
Because the test statistic is more than the 5% critical value and the p-value is larger than 0.05, the moving
average is not constant over time and the null hypothesis of the Dickey-Fuller test cannot be rejected. This
shows that the weekly time series is not stationary
Before you can apply ARIMA models for forecasting, you need to transform this time series into a stationary time
series
Copyright © edureka and/or its affiliates. All rights reserved.
Stationarize the Time Series
You can apply differencing to stationarize your data. The data may undergo log transformation prior to
differencing for better results
ts_week_log = np.log(ts_week)
ts_week_log_diff = ts_week_log - ts_week_log.shift()
plt.plot(ts_week_log_diff)
Copyright © edureka and/or its affiliates. All rights reserved.
Confirming with Dickey Fuller Test
ts_week_log_diff.dropna(inplace=True)
test_stationarity(ts_week_log_diff)
Copyright © edureka and/or its affiliates. All rights reserved.
Inference Drawn After Stationarizing the Time Series
The test statistic is significantly less than the 1% critical value, shows that your time series is now stationary with
99% confidence. Now you can begin to apply statistical models like ARIMA to forecast future Euro exchange
rates using this stationarized time series
Copyright © edureka and/or its affiliates. All rights reserved.
Defining ACF & PACF
ACF and PACF help determine the p, d and q model parameters which you will need later as input for the ARIMA
model
#ACF and PACF
lag_acf = acf(ts_week_log_diff, nlags=10)
lag_pacf = pacf(ts_week_log_diff, nlags=10, method='ols')
Copyright © edureka and/or its affiliates. All rights reserved.
Plot ACF
#Plot ACF:
plt.subplot(121)
plt.plot(lag_acf)
plt.axhline(y=0,linestyle='--
',color='gray')
plt.axhline(y=-
7.96/np.sqrt(len(ts_week_log_diff),lines
tyle='--',color='gray')
plt.axhline(y=7.96/np.sqrt(len(ts_week_l
og_diff)),linestyle='--',color='gray')
plt.title('Autocorrelation Function')
Copyright © edureka and/or its affiliates. All rights reserved.
Plot PACF
#Plot PACF:
plt.subplot(122)
plt.plot(lag_pacf)
plt.axhline(y=0,linestyle='--
',color='gray')
plt.axhline(y=-
7.96/np.sqrt(len(ts_week_log_diff)),line
style='--',color='gray')
plt.axhline(y=7.96/np.sqrt(len(ts_week_l
og_diff)),linestyle='--',color='gray')
plt.title('Partial Autocorrelation
Function')
plt.tight_layout()
Copyright © edureka and/or its affiliates. All rights reserved.
Inference Drawn Post ACF and PACF Plot
Using the plot, 'p' and 'q' values can be determined as follows:
p: The lag value where the PACF cuts off (drops to 0)
1 for the first time. If you look closely, p=2
2 q: The lag value where the ACF chart
crosses the upper confidence interval for
the first time. If you look closely, q=1
Copyright © edureka and/or its affiliates. All rights reserved.
Plotting the ARIMA Model
Optimal values for the ARIMA(p,d,q) model are (2,1,1). Hence, plot the ARIMA model using the values (2,1,1)
model = ARIMA(ts_week_log, order=(2, 1, 1))
results_ARIMA = model.fit(disp=-1)
plt.plot(ts_week_log_diff)
plt.plot(results_ARIMA.fittedvalues, color='red')
plt.title('RSS: %.4f'%
sum((results_ARIMA.fittedvalues-ts_week_log_diff)**2))
Copyright © edureka and/or its affiliates. All rights reserved.
The ARIMA Plot
Copyright © edureka and/or its affiliates. All rights reserved.
Residual Analysis
Print the results of the ARIMA model and plot the residuals
print(results_ARIMA.summary())
# plot residual errors
residuals = DataFrame(results_ARIMA.resid)
residuals.plot(kind='kde')
print(residuals.describe())
Copyright © edureka and/or its affiliates. All rights reserved.
ARIMA Model Summary
Copyright © edureka and/or its affiliates. All rights reserved.
Residual Plot
A residual plot centred around 0 indicates good model
Copyright © edureka and/or its affiliates. All rights reserved.
Predictions
predictions_ARIMA_diff =
pd.Series(results_ARIMA.fittedvalues, copy=True)
print (predictions_ARIMA_diff.head())
Copyright © edureka and/or its affiliates. All rights reserved.
Scaling Predictions
Now that the model is returning the results you want to see, you can scale the model predictions back to the
original scale. Hence, remove the first order differencing and take exponent to restore the predictions back to
their original scale
predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()
predictions_ARIMA_log = pd.Series(ts_week_log.ix[0],
index=ts_week_log.index)
predictions_ARIMA_log =
predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,fill_value=0)
predictions_ARIMA = np.exp(predictions_ARIMA_log)
plt.plot(ts_week)
plt.plot(predictions_ARIMA)
plt.title('RMSE: %.4f'% np.sqrt(sum((predictions_ARIMA-
ts_week)**2)/len(ts_week)))
Copyright © edureka and/or its affiliates. All rights reserved.
Scaled Output
The RMSE is close enough to 0: the model predictions are accurate
Copyright © edureka and/or its affiliates. All rights reserved.
Training & Testing Datasets
Now you need to split the data set into a training and testing data sets
size = int(len(ts_week_log) - 15)
train, test = ts_week_log[0:size],
ts_week_log[size:len(ts_week_log)]
history = [x for x in train]
predictions = list()
Copyright © edureka and/or its affiliates. All rights reserved.
Training the Model and Forecasting
You will use the training data set to train the ARIMA model and perform out-of-sample forecasting. Then you will
compare the results of your out-of-sample predictions for Euro rates with the actual values from the test dataset
size = int(len(ts_week_log) - 15)
train, test = ts_week_log[0:size], ts_week_log[size:len(ts_week_log)]
history = [x for x in train]
predictions = list()
print('Printing Predicted vs Expected Values...')
print('\n')
for t in range(len(test)):
model = ARIMA(history, order=(2,1,1))
model_fit = model.fit(disp=0)
output = model_fit.forecast()
yhat = output[0]
predictions.append(float(yhat))
obs = test[t]
history.append(obs)
print('predicted=%f, expected=%f' % (np.exp(yhat), np.exp(obs)))
Copyright © edureka and/or its affiliates. All rights reserved.
Validating the Model
Validate the model by comparing its out-of-sample predictions for Euro rates with actual values from the test
data set and calculating the mean squared error
error = mean_squared_error(test, predictions)
print('\n')
print('Printing Mean Squared Error of Predictions...')
print('Test MSE: %.6f' % error)
predictions_series = pd.Series(predictions, index = test.index)
Copyright © edureka and/or its affiliates. All rights reserved.
Plotting Forecasted vs Observed Values
fig, ax = plt.subplots()
ax.set(title='Spot Exchange Rate, Euro into USD', xlabel='Date',
ylabel='Euro into USD')
ax.plot(ts_week[-60:], 'o', label='observed')
ax.plot(np.exp(predictions_series), 'g', label='rolling one-step out-of-
sample forecast')
legend = ax.legend(loc='upper left')
legend.get_frame().set_facecolor('w')
Copyright © edureka and/or its affiliates. All rights reserved.
Forecasted vs Observed Plot
Copyright © edureka and/or its affiliates. All rights reserved.
Summary
▪ Components of Time Series Data
▪ AR Model
▪ MA Model
▪ Stationarity
▪ Differencing
▪ ACF vs PACF
Copyright © edureka and/or its affiliates. All rights reserved.
Copyright © edureka and/or its affiliates. All rights reserved.
Copyright © edureka and/or its affiliates. All rights reserved.