Forecasting Techniques
Module 5
(Refer: Textbook 3: U Dinesh , Business
Analytics The Science of Data driven Decision
Chapter 13)
Introduction
Forecasting is by far the most important and frequently used application of
predictive analytics.
Forecasting can be very challenging due to several factors that influence the
demand and scale of business, with stock keeping units (SKUs) running into
several millions.
Example: Amazon.com sells more than 350 million products through its E-
commerce portal. Amazon itself sells about 13 million SKUs and has more (about 2
million) retailers selling their products through Amazon (Ali, 2017). Predicting
demand for these products is important since overstocking can impact the bottom
line and under stocking can result in customer dissatisfaction. Amazon.com may
not stock all SKUs they sell through their portal since most of them are sold by
their suppliers (online marketplace) directly to the customers, but even if they
have to predict demand for products directly sold by them, then the number of
SKUs is 13 million. 2
TIME-SERIES DATA AND COMPONENTS OF
TIME-SERIES DATA
Time-series data is data on a response variable, Yt(a random variable), such as
demand for a spare part of a capital equipment or a product or a service, or market
share of a brand observed at different time points t.
A forecasting problem uses time-series data.
The data points or measurements are usually collected at regular intervals and are
arranged in chronological order.
The time-series data can be
Univariate: contains observations of just a single variable.
Example: Demand of a product at time t
Multivariate: the data consists of more than one variable.
Example: demand for a product at time t, price at time t, amount of money spent by the company
on promotion at time t, competitors price at time t, etc.
3
TIME-SERIES DATA AND COMPONENTS OF
TIME-SERIES DATA
From a forecasting perspective, time-series data can be broken into the following
components:
Trend Component (Tt)
Seasonal Component (St)
Cyclical Component (Ct)
Irregular Component (It)
4
TIME-SERIES DATA AND COMPONENTS OF
TIME-SERIES DATA
Trend Component (Tt): Trend is the consistent long-term upward or downward movement of
the data over a period of time.
TIME-SERIES DATA AND COMPONENTS OF
TIME-SERIES DATA
Seasonal Component (St): is measured using the seasonality index. It is the repetitive upward
or downward movement (or fluctuations) from the trend that occurs within a calendar year, such
as seasons, quarters, months, days of the week, etc.
The upward or downward fluctuation may be caused due to festivals, customs within a society,
school holidays, business practices within the market, such as ‘end of season sale’, and so on.
For example, in India, demand for many products surges during the festival months of
October and November.
A similar pattern exists during December in many countries due to Christmas.
Usually, for a given context, seasonal fluctuation occurs at fixed intervals (such as months,
quarters) known as periodicity of seasonal variation and repeats over time.
TIME-SERIES DATA AND COMPONENTS OF
TIME-SERIES DATA
Cyclical Component (Ct): is fluctuation around the trend line that happens due to macro-
economic changes such as recession, unemployment, etc.
Cyclical fluctuations have repetitive patterns with a time between repetitions of more than a
year.
Seasonal v/s cyclical:
seasonal fluctuation occurs at a fixed period within a calendar year, whereas cyclical
fluctuations have a random time between fluctuations.
periodicity of seasonal fluctuations is constant, whereas the periodicity of cyclical
fluctuations is not constant
TIME-SERIES DATA AND COMPONENTS OF
TIME-SERIES DATA
Irregular Component (It): Irregular component is the white noise or random uncorrelated
changes that follow a normal distribution with a mean value of 0 and constant variance.
TIME-SERIES DATA AND COMPONENTS OF
TIME-SERIES DATA
The time-series data can be modelled as an addition or product of T t, St, Ct, and It
components.
Additive models
given by Yt= Tt + St + Ct + It
assume that the seasonal and cyclical components are independent of the trend
component.
not very common since in many cases the seasonal component may not be independent
of trend.
Example: At the Indian Institute of Management Bangalore (IIMB) there are many weekend
programs and the number of students enrolled in these programs is fixed. The
demand for food at the canteens of IIMB increases by a fixed quantity on Saturdays. This
increase in demand is additive in nature.
appropriate if the seasonal component remains constant at the level (or mean) and does
not vary with the level of the series
TIME-SERIES DATA AND COMPONENTS OF
TIME-SERIES DATA
Multiplicative models
given by Yt= Tt × St × Ct × It
more common and are a better fit for many data sets
In many cases, Yt= Tt × St is used because a large data set is needed to estimate the
cyclical component
more appropriate if seasonal variation is correlated with the level (local mean).
Model Selection Summary
Situation Suitable Model
Seasonal variation constant Additive Model
Seasonal variation changes with level Multiplicative Model
Forecasting Techniques and Forecasting Accuracy
1. Forecasting Techniques
• Simple Techniques:
a. Moving Average (MA)
b. Exponential Smoothing
c. Predict future values using past observations.
• Regression-based Models:
a. Auto-Regressive (AR)
b. Moving Average (MA)
c. ARMA (Auto-Regressive Moving Average)
d. ARIMA (Auto-Regressive Integrated Moving Average)
e. ARIMAX (ARIMA with Explanatory Variables)
• Key Insight:
a. Complex models do not always ensure better accuracy.
b. Sometimes, simple models outperform advanced ones (Chatfield, 1986).
Topics covered from
Reference Book 3: Chapter 13- 13.1, 13.2
Forecasting Techniques
This is taken from Chapter 13. reference book 3 ( Business
Analytics by (Dinesh Kumar)
FORECASTING TECHNIQUES AND FORECASTING
ACCURACY
There are many forecasting techniques developed based on
different logics.
Simple techniques such as moving average and exponential
smoothing predict the future value of a time-series data as a
function of the past observations.
Whereas the regression-based models such as auto-
regressive (AR), moving average (MA), auto-regressive and
moving average (ARMA), auto-regressive integrated moving
average (ARIMA), and auto-regressive integrated moving
average with X (ARIMAX) use more sophisticated regression
models to forecast the future value of a time-series data.
FORECASTING TECHNIQUES AND FORECASTING
ACCURACY
Usually, many different forecasting techniques such as
moving average, exponential smoothing, and ARIMA are
used for forecasting before selecting the best model.
The model selection may depend on the chosen
forecasting accuracy measure.
The following four forecasting accuracy measures are
frequently used:
1. Mean absolute error
2. Mean absolute percentage error
3. Mean squared error
4. Root mean square error
Mean Absolute Error (MAE)
Mean absolute error (MAE) is the average absolute error and
should be calculated on the validation data set.
Assume that the validation data has n observations and
forecasting is carried out on these n observations using the
model developed.
The mean absolute error is given by
Yt is the actual value of Y at time t and Ft is the
corresponding forecasted value
Mean Absolute Percentage Error (MAPE)
Mean absolute percentage error (MAPE) is the average of absolute
percentage error.
Assume that the validation data has n observations, and the
forecasting is carried out on these n observations.
The mean absolute percentage error is given by
MAPE is one of the popular forecasting accuracy measures used by
practitioners since it expresses the average error in percentage
terms and is easy to interpret.
Since MAPE is dimensionless it can be used for comparing different
models with varying scales.
Mean Square Error (MSE)
Mean square error is the average of squared error
calculated over the validation data set.
MSE is given by
Lower MSE implies better prediction.
However, it depends on the range of the time-series data.
Root Mean Square Error (RMSE)
Root mean square error (RMSE) is the square root of mean
square error and is given by
RMSE along with MAPE are two most popular accuracy
measures of forecasting.
RMSE is the standard deviation of errors or residuals
Forecasting Techniques
Moving Average Method
Moving Average is one of the simplest forecasting
techniques which forecasts the future value of a time
series data using average of N observations.
The above formula is called Simple Moving Average ( SMA)
since N past observations are given equal weights ( 1/N).
Weighted Average Method
In a weighted moving average, past observations are given
differential weights.
Generally, weight decreases as the data becomes older.
Where WK is the weight given to value of Y at time K(Yk)
and
Example 1
We Sell Beauty (WSB) is a manufacturer and distributor of health and
beauty products. WSB is interested in forecasting demand for ‘Kesh’,
their shampoo brand which is sold in 100 ml bottles. WSB believes that
the monthly demand for ‘Kesh’ depends on the promotion expenditure
(in thousands of rupees) and whether the competition was on promotion
or not during that month. The data for 48 months (starting from January
2012) is shown in Table 13.1. The table has the quantity of 100 ml
bottles sold during the month, promotion expenses (in thousands of
rupees) incurred by the company, and whether the competition was on
promotion (value of 1 implies that the competition was on promotion
and 0 otherwise). Use simple moving average with N = 12 and forecast
the demand of Kesh for months 37 to 48. Calculate the values of MAPE
and RMSE.
Example 1 (Contd.)
Moving average forecast for the period n = 37 to
48 is given by
Example 1 (Contd.)
The forecasted values using 12 period moving average
and the corresponding RMSE and MAPE calculations are
given in Table 13.2.
Example 1 (Contd.)
The RMSE using the moving average forecast is given
by 734725.8359 and the MAPE value is 0.1403 (or
14.03%). The graph of actual and forecasted demand is
shown in Figure 13.2.
Example 1 (Contd.)
In moving average an important decision that a one has to take is the number
of periods, N.
The forecast accuracy will depend on the chosen N.
If N is small, then the average tends to be more sensitive to recent
observations or more responsive to recent trend.
So, if responsiveness is important, then relatively few data points may be
included.
This would enable the moving average to make adjustments with the changes
in the data quickly, though at times it would also be responding to just the
random noise in the data.
On the other hand, if N is large, that is more data points are included, then the
forecast will be less sensitive or response to the recent changes in the data.
Since the moving average will always be centered around the range of the
data points considered, it will lag behind the trend until about (N + 1)/2 time
periods.
Time Series Forecasting
SINGLE EXPONENTIAL SMOOTHING (SES)
One of the drawbacks of simple moving average technique
is that it gives equal weight to all the previous
observations used in forecasting the future value
This can be overcome by assigning differential weights to
the past observations
One easier way to assign differential weight is achieved by
using single exponential smoothing (SES) technique (also
known as simple exponential smoothing).
SINGLE EXPONENTIAL SMOOTHING (SES)
Just like the moving average, SES assumes a fairly steady time-series data with no
significant trend, seasonal or cyclical component
Here, the weights assigned to past data decline exponentially with the most recent
observations assigned higher weights
In single ES, the forecast at time (t + 1) is given by
Where α is called the smoothing constant and its value lies between 0 and 1.
Since the model uses one smoothing constant, it is called single exponential
smoothing.
Parameter a in Eq. (13.9) is called the smoothing constant and its value lies
between 0 and 1. Since the model uses one smoothing constant, it is called single
exponential smoothing. Substituting for Ft recursively in Eq. (13.9), we get
SINGLE EXPONENTIAL SMOOTHING (SES)
α= 0.4 Month
Sales
SES: Ft
(units)
Ft=Yt
Jan 50 50
= 50 Feb 55 50
Formula: Mar 53 52
Apr 60 52.4
May 62 55.44
Jun ? 58.064
SMA vs WMA vs SES
Sales
Month SMA SES
(units)
Jan 50 – 50
Feb 55 – 50
Mar 53 – 52
Apr 60 52.67 52.4
May 62 56.00 55.44
Jun ? 58.33 58.064
SINGLE EXPONENTIAL SMOOTHING (ES)
Substituting for Ft recursively in Eq.
(13.9), we get,
From Eq. (13.10), it is evident that the
weights assigned to older observations
decrease exponentially.
Figure 13.3 shows the rate at which
the weight decreases for older
observations when a = 0.4 and 0.8;
the plot resembles the exponential
decay curve
Example
The forecasted values for
months 37 to 48 for the
data in Table 13.1 using
simple exponential
smoothing is shown in
Table 13.3.
Exponential smoothing
uses the entire historical
data.
Example (Contd.)
To begin exponential smoothing we will need the forecast for
the Ft in Eq. (13.9).
We can use Ft=Yt or use moving average to forecast the
initial forecast Ft .
The forecasted value for period 2 is given by
We will assume F1 same as Y1 .
Thus, the value of F2 will be same as Y1 , that is 3002666.
The forecasted values using single exponential smoothing
with α = 0.2 are shown in Table 13.3.
Example (Contd.)
The RMSE using the single
exponential smoothing with a =
0.2 is given by 742339.222 and
the MAPE value is 0.1394 (or
13.94%).
Advantages and Disadvantages
Advantages:
It uses all the historic data unlike the moving average where
only the past few observations are considered to predict the
future value.
It assigns progressively decreasing weights to older data.
Disadvantages
Increasing n makes forecast less sensitive to changes in data.
It always lags behind trend as it is based on past
observations. The longer the time period n, the greater the
lag as it is slow to recognize the shifts in the level of the data
points.
Forecast bias and systematic errors occur when the
observations exhibit strong trend or seasonal patterns.
DOUBLE EXPONENTIAL SMOOTHING – HOLT’S
METHOD
One of the drawbacks of single exponential smoothing is that the
model does not do well in the presence of trend
This can be improved by introducing an additional equation for
capturing the trend in the time-series data.
Double exponential smoothing uses two equations to forecast the
future values of the time series, one for forecasting the level
shown in 13.12 (short term average value) and another for
capturing the trend shown in equation 13.13.
α and β are the smoothing constants for level and trend,
DES- HOLT’S METHOD
Level (smoothed estimate of the current value)
L = αY + (1-α)F
t t t
Initial = Yt
Trend (smoothed estimate of the slope) β= 0.9
T =β(L - L ) + (1- β)T
t t t-1 t-1
Sales
Initial=(Yt-Y1)/(t-1) Mon
(units) Lt Tt Ft
th
T3=0.9(57.2-55)+(0.1)5 Yt
Forecast for t+1 period 1 Jan 50
F
t+1= Lt + Tt 2
55 55 5
F3=50 + 5 Feb
Forecast for n periods ahead 3
53 57.2 2.48 60
F
Mar
t+n= Lt + nTt
4
60 59.808 2.5952 59.68
Apr
5 2.45004
62 62.24192 62.4032
HOLT’S METHOD
The forecast at time t + 1 is given by
where Lt is the level which represents the smoothed value up to and
including the last data, Tt is the slope of the line or the rate of increase
or decrease at period t, n is the number of time periods into the future.
Initial value of L is usually taken same as Y (that is, L = Y ).
t t t t
The starting value of Tt can be taken as (Yt – Yt−1) or the difference
between two previous actual values of observations prior to the
period for which forecasting is carried out.
Another option for Tt is (Yt – Y1)/(t − 1).
HOLT’S METHOD
When to Use
Use while forecasting future periods
✅ After last observed data point (no )
Do not use while training (when exists)
⚙️During training → update and
🚀 After training → forecast using
Visual Idea:
Make a timeline graphic:
|----|----|----|----|----|----|----|----|
Y₁ Y₂ Y₃ ... Y₃₅ (Actual data) | F₃₆ F₃₇ F₃₈ (Forecast)
↑
Use L₃₅ and T₃₅ → F₃₆ = L₃₅ + 1×T₃₅
We’ll now use Holt’s forecast equation:
Here (last known point).
So:
For Month 7 →
For Month 8 →
Forecast for Month 7:
This predicts sales in Month 7 will be 133.5
Forecast for Month 8:
This predicts sales in Month 8 will be 137.0
TRIPLE EXPONENTIAL SMOOTHING (HOLT-WINTER
MODEL)
Moving averaging and single and double exponential smoothing techniques so
far can handle data as long as the data do not have any seasonal component
associated with it.
However, when there is seasonality in the time-series data, techniques such as
moving average, exponential smoothing, and double exponential smoothing are
no longer appropriate.
In most cases, the fitted error values (actual demand minus forecast)
associated with simple exponential smoothing and Holt’s method will indicate
systematic error patterns that reflect the existence of seasonality.
For example, presence of seasonality may result in all positive errors, except for
negative values that occur at fixed intervals.
Such pattern in error would imply existence of seasonality.
Such time series data require the use of a seasonal method to eliminate the
systematic patterns in error.
HOLT-WINTER MODEL
Triple exponential smoothing is used when the data has trend
as well as seasonality.
The following three equations which account for level, trend,
and seasonality are used for forecasting (for a multiplicative
model, Winters 1960):
HOLT-WINTER MODEL
Predicting Seasonality Index Using Method of Averages
The following steps are used for predicting the seasonality index using method of averages:
1. Calculate the average of value of Y for each season (that is, if the data is monthly data,
then we need to calculate the average for each month based on the training data). Let
these averages be
2. Calculate the average of the seasons’ averages calculated in step 1 (say Y ).
3. The seasonality index for season k is given by the ratio
Variation to the procedure explained above is first divide the value of Yt with its yearly
average and calculate the seasonal average.
We will use first 3 years of data in Table 13.1 to calculate the seasonality index for various
months.
The seasonality index based on first 3 years of data using method of averages is shown in
Table 13.6.
Seasonality index can be interpreted as percentage change from the trend line.
For example, the seasonality index for January is approximately 1.088 or 108.8%.
This implies that in January, the demand will be approximately 8.8% more from the trend
line. The seasonality index for March is 0.8885 or 88.85%.
Exercise Question
Quarterly demand for certain parts manufactured by
Jack and Jill company is shown in Table 13.32.
(a) Calculate the seasonality index for different quarters using the first 3 years of data.
(b)Develop forecasting models using moving average, single exponential smoothing
(c) Forecast the demand for 2015 (all four quarters) using moving average, exponential smoothing.
Calculate RMSE, MAPE, and Theil’s coefficient.
Regression model for
Forecasting
Introduction
• What is Regression Forecasting?
– Regression forecasting is used when the future value of a
dependent variable (e.g., sales, demand, etc.) depends on one or
more explanatory (predictor) variables such as price,
advertising spend, or competitor activity.
• Regression is probably a more appropriate method for forecasting
when the data has values of predictor (explanatory) variables in
addition to the dependent variable Yt.
– Uses known predictor variables (e.g., promotions, competitor
activity)
• Regression is a causal forecasting method.
• More accurate than simple time-series methods like exponential
smoothing(Parker & Segura, 1971)
Regression Forecasting
• Consider an Example 13.1:
The regression outputs using SPSS
We get two outputs from SPSS:
1. Model summary
2. Coefficients
1. Model summary
2. Coefficients
Model Summary: Interpretation
Model summary
Metric Value Interpretation
R 0.928 Strong correlation
R² 0.862 86.2% variance explained
Adj. R² 0.853 Model fits well
Std. Error 207017 Moderate error
Durbin–Watson 1.608 No autocorrelation
Why Check Auto-correlation?
•
Co-efficients: Interpretation
Co-efficients
The regression model (based on values calculated previously)is given below
As expected, the sales increases as the promotion expenses
increase and the sales decreases whenever the competition is on
promotion
Forecasting using Regression
• The forecasted values for period 37 to 48 using the regression model.
Advantages of Regression Forecasting
• Incorporates multiple influencing factors
• Offers statistical validation (R², significance tests)
• Provides interpretability (impact of each predictor)
Comparison of Forecasting Methods
Forecasting
RMSE MAPE (%) Remarks
Method
Moving Average 734,725.84 14.03 Higher error; less accurate
Single
Similar to moving average;
Exponential 742,339.22 13.94
moderate accuracy
Smoothing
Regression Best performance;
302,969.00 4.19
Model significantly lower error
Interpretation:
The regression-based forecasting model achieves a much smaller RMSE and
MAPE compared to moving average and exponential smoothing, indicating
superior predictive accuracy and better fit to the actual data.
Tutorial Question:
(a) Develop a forecasting model using moving average (N=5) and single exponential
smoothing (α=0.5). Calculate the MAPE for both models. Which model gives the
least MAPE?
• Refer: Textbook 3: U Dinesh , Business Analytics The
Science of Data driven Decision
Chapter 13- Forecasting Techniques for Exercise
Questions.