0% found this document useful (0 votes)
45 views51 pages

Arima

The document discusses Time Series Forecasting models, specifically ARIMA, SARIMA, and ARIMAX, focusing on their components and applications. It explains how these models can predict future values based on past observations, incorporating trends, seasonality, and external factors. The document also provides practical examples and comparisons of model capabilities, guiding when to use each model type.

Uploaded by

hado.31221022866
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views51 pages

Arima

The document discusses Time Series Forecasting models, specifically ARIMA, SARIMA, and ARIMAX, focusing on their components and applications. It explains how these models can predict future values based on past observations, incorporating trends, seasonality, and external factors. The document also provides practical examples and comparisons of model capabilities, guiding when to use each model type.

Uploaded by

hado.31221022866
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1

Time Series Forecasting


Models
ARIMA, SARIMA, ARIMAX

Quang Nguyen
CELG, UEH

2025-03-01

For Courses in Time Series Econometrics and Advanced Quantitative Methods


2

The Time Series Forecasting Challenge


Goal: Predict future values based on past observations
Challenge: Real-world time series contain multiple patterns:
Long-term trends
Seasonal patterns
Cyclical fluctuations
Irregular movements
External influences

For Courses in Time Series Econometrics and Advanced Quantitative Methods


3

The Fundamental Question


If we observe a pattern like this:

How do we predict what comes next?


For Courses in Time Series Econometrics and Advanced Quantitative Methods
4

ARIMA: The Building Blocks Approach


ARIMA models work by combining three key components:

1. AR - AutoRegressive: Today depends on yesterday


2. I - Integrated: Accounting for trends
3. MA - Moving Average: Learning from past surprises

Let’s understand each component intuitively…

For Courses in Time Series Econometrics and Advanced Quantitative Methods


5

The AR Component: Learning from the Past


Autoregression: Current values depend on previous values
Intuition: Like a ball rolling with momentum

For Courses in Time Series Econometrics and Advanced Quantitative Methods


6

The I Component: Handling Trends


Integration/Differencing: Accounts for upward or downward trends
Intuition: Looking at changes rather than absolute values

For Courses in Time Series Econometrics and Advanced Quantitative Methods


7

The MA Component: Learning from Surprises


Moving Average: Current values depend on recent unexpected changes
Intuition: Adjusting predictions based on recent mistakes

For Courses in Time Series Econometrics and Advanced Quantitative Methods


8

Combining the Components: ARIMA


ARIMA(p,d,q) combines all three components: p: How many past values
influence today (AR); d: How many times we difference the data (I); q: How
many past forecast errors matter (MA)
Real-World Analogy: Weather Forecasting
AR Component: “Today was hot, so tomorrow will likely be hot too”
I Component: “Temperatures have been rising all week, so tomorrow
will be even hotter”
MA Component: “Yesterday’s forecast was 2° too high, so let’s adjust
today’s prediction downward”
ARIMA combines these intuitions into a mathematical model.
For Courses in Time Series Econometrics and Advanced Quantitative Methods
9

What About Seasonality?


Many time series show regular patterns that repeat:

How do we account for these repeating patterns?


For Courses in Time Series Econometrics and Advanced Quantitative Methods
10

SARIMA: Adding Seasonality


SARIMA: Seasonal ARIMA ARIMA có tính mùa => SARIMA

Notation: ARIMA(p,d,q)(P,D,Q)[m]
(p,d,q): Regular ARIMA components
(P,D,Q): Seasonal equivalents
m: Length of seasonal cycle (e.g., 12 for monthly data)

For Courses in Time Series Econometrics and Advanced Quantitative Methods


11

SARIMA: Two Layers of Patterns

For Courses in Time Series Econometrics and Advanced Quantitative Methods


12

SARIMA: Real-World Example


Monthly retail sales:
Regular pattern: This month’s sales relate to last month’s (AR)
Seasonal pattern: December is always high (seasonal AR)
Regular adjustment: Correcting for recent forecast errors (MA)
Seasonal adjustment: Learning from last year’s holiday season errors
(seasonal MA)

For Courses in Time Series Econometrics and Advanced Quantitative Methods


13

What About External Factors?


Time series often respond to outside influences:
Energy consumption during economic recession suy thoái kinh tế

Product sales during a marketing campaign


Traffic patterns during construction

How do we incorporate these external factors?

For Courses in Time Series Econometrics and Advanced Quantitative Methods


14

ARIMAX: Adding External Variables


ARIMAX: ARIMA with eXogenous variables ARIMA có yếu tố khác => ARIMAX

Intuition: Combining time patterns with external influences


Examples of external factors:
Economic indicators (GDP, unemployment)
Policy changes (tax rates, regulations)
Crises (pandemics, natural disasters)
Planned events (holidays, promotions)

For Courses in Time Series Econometrics and Advanced Quantitative Methods


15

ARIMAX: Visual Intuition

For Courses in Time Series Econometrics and Advanced Quantitative Methods


16

When to Use Each Model?


ARIMA: When data shows:
No clear seasonality and persistent patterns/trends
Example: Daily stock prices
SARIMA: When data shows:
Regular seasonal patterns
Example: Monthly energy consumption
ARIMAX: When data is affected by:
Known external events or measurable external factors
Example: Energy consumption during recessions/pandemics
For Courses in Time Series Econometrics and Advanced Quantitative Methods
17

Comparing Model Capabilities


Feature ARIMA SARIMA ARIMAX
Accounts for trends Yes Yes Yes
Captures persistence Yes Yes Yes
Models forecast errors Yes Yes Yes
Handles seasonality No Yes Possible
Incorporates external factors No No Yes

For Courses in Time Series Econometrics and Advanced Quantitative Methods


18

Real-World Analogies
ARIMA: Like forecasting your daily commute time based only on how long
it took the past few days
SARIMA: Like forecasting your commute time knowing that Monday
có tính mùa
mornings and Friday evenings always have different patterns
ARIMAX: Like forecasting your commute time considering both past
Chèn văn bản vào đây
patterns AND knowing about a planned road construction

For Courses in Time Series Econometrics and Advanced Quantitative Methods


19

Illustration: Energy Consumption Example


Let’s consider energy consumption forecasting:
ARIMA: “Energy use tomorrow depends on today’s use plus recent trends”
SARIMA: “January is always high due to heating, July is high due to
cooling - these seasonal patterns repeat annually”
ARIMAX: “Energy consumption follows its usual patterns BUT drops
significantly during economic recessions and the COVID-19 pandemic”
Model selection should be guided by (1) the characteristics of your data;
(2) the forecasting goal; (3) the availability of external information

For Courses in Time Series Econometrics and Advanced Quantitative Methods


20

Illustration: Energy Consumption Example


Real-world application using U.S. energy consumption data (1973-2024)
Step-by-step modeling process:
Data exploration and properties assessment
Model specification and diagnostics
Forecasting and validation
Handling structural breaks and exogenous shocks

For Courses in Time Series Econometrics and Advanced Quantitative Methods


21

Primary Energy Consumption Data


1 # Load energy consumption data
2 energy_data <- [Link]("enercons_quadbtu.csv")
3
4 # Create a time series object starting from January 1973 with monthly frequency
5 energy_ts <- ts(energy_data$[Link],
6 start=c(1973, 1), frequency=12)
7
8 # Time series plot
9 autoplot(energy_ts) +
10 ggtitle("Monthly U.S. Total Primary Energy Consumption (1973-2024)") +
11 ylab("Quadrillion BTU") +
12 theme_minimal()

For Courses in Time Series Econometrics and Advanced Quantitative Methods


22

Seasonal Patterns in Energy Consumption


1 # Seasonal plot
2 ggseasonplot(energy_ts, [Link]=TRUE, [Link]=TRUE) +
3 ggtitle("Seasonal Plot: U.S. Energy Consumption") +
4 ylab("Quadrillion BTU")

For Courses in Time Series Econometrics and Advanced Quantitative Methods


23

Time Series Decomposition


1 # Time series decomposition
2 energy_decomp <- stl(energy_ts, [Link]="periodic")
3 autoplot(energy_decomp) +
4 ggtitle("Decomposition of U.S. Energy Consumption")

For Courses in Time Series Econometrics and Advanced Quantitative Methods


24

Stationarity Assessment
Before fitting ARIMA models, we need to check for stationarity
Non-stationary time series need to be transformed (differencing)
1 # KPSS test for stationarity
2 [Link](energy_ts)

KPSS Test for Level Stationarity

data: energy_ts
KPSS Level = 7.001, Truncation lag parameter = 6, p-value = 0.01

1 # Reject H0: Series is non-stationary

For Courses in Time Series Econometrics and Advanced Quantitative Methods


25

Differencing for Stationarity


1 par(mfrow=c(2,1), mar=c(4,4,2,1))
2
3 # Plot original series and first difference
4 plot(energy_ts, main="Original Series", ylab="Quad BTU")
5
6 # Regular differencing (first difference)
7 energy_diff1 <- diff(energy_ts, differences=1)
8 plot(energy_diff1, main="First Difference", ylab="Change in Quad BTU")
1 par(mfrow=c(1,1))

For Courses in Time Series Econometrics and Advanced Quantitative Methods


26

Seasonal Differencing
1 # Seasonal differencing and both regular+seasonal
2 par(mfrow=c(2,1), mar=c(4,4,2,1))
3
4 # Seasonal differencing
5 energy_season_diff <- diff(energy_ts, lag=12)
6 plot(energy_season_diff, main="Seasonal Difference (lag=12)",
7 ylab="Seasonal Change")
8
9 # Both regular and seasonal differencing
10 energy_both_diff <- diff(diff(energy_ts, lag=12), differences=1)
11 plot(energy_both_diff, main="Regular + Seasonal Differencing",
12 ylab="Change")
1 par(mfrow=c(1,1))

For Courses in Time Series Econometrics and Advanced Quantitative Methods


27

ACF and PACF Analysis


1 par(mfrow=c(2,1))
2 Acf(energy_both_diff, main="ACF of Regular+Seasonal Differenced Series", [Link]=36)
3 Pacf(energy_both_diff, main="PACF of Regular+Seasonal Differenced Series", [Link]=36)
1 par(mfrow=c(1,1))

For Courses in Time Series Econometrics and Advanced Quantitative Methods


28

ARIMA Models
ARIMA(p,d,q) (AutoRegressive Integrated Moving Average) models
combine
AR(p) autoregressive terms,
I(d) differencing, and
MA(q) moving average terms.
Estimation approach:
Automatic selection using information criteria
Manual specification based on ACF/PACF patterns

For Courses in Time Series Econometrics and Advanced Quantitative Methods


29

ARIMA Model Comparison


1 # Automatic ARIMA selection
2 auto_arima <- [Link](energy_ts, seasonal=FALSE,
3 stepwise=TRUE, approximation=FALSE)
4
5 # Manual ARIMA selection (example models)
6 arima1 <- Arima(energy_ts, order=c(1,1,1))
7 arima2 <- Arima(energy_ts, order=c(2,1,2))
8 arima3 <- Arima(energy_ts, order=c(1,1,2))
9
10 # Compare models using AIC/BIC
11 models <- list(auto_arima, arima1, arima2, arima3)
12 AICs <- sapply(models, function(x) x$aic)
13 BICs <- sapply(models, function(x) x$bic)
14 [Link](Model=c(paste0("ARIMA(", toString(arimaorder(auto_arima)),")"),
15 "ARIMA(1,1,1)", "ARIMA(2,1,2)", "ARIMA(1,1,2)"),
16 AIC=round(AICs, 2), BIC=round(BICs, 2)) %>% kable()

For Courses in Time Series Econometrics and Advanced Quantitative Methods


30

ARIMA Model Comparison


Model AIC BIC
ARIMA(1, 1, 5) 649.67 680.70
ARIMA(1,1,1) 797.53 810.83
ARIMA(2,1,2) 784.73 806.89
ARIMA(1,1,2) 782.21 799.94

For Courses in Time Series Econometrics and Advanced Quantitative Methods


31

ARIMA Model Diagnostics


1 # Select best ARIMA model based on AIC
2 best_arima_index <- [Link](AICs)
3 best_arima <- models[[best_arima_index]]
4
5 # Check residuals of the best ARIMA model
6 checkresiduals(best_arima)

Ljung-Box test

data: Residuals from ARIMA(1,1,5)


Q* = 688.63, df = 18, p-value < 2.2e-16

Model df: 6. Total lags used: 24

For Courses in Time Series Econometrics and Advanced Quantitative Methods


32

SARIMA Models
SARIMA(p,d,q)(P,D,Q)m (Seasonal ARIMA) extends ARIMA by adding
seasonal components:
(p,d,q): Non-seasonal components
(P,D,Q): Seasonal components
P: Seasonal autoregressive order
D: Seasonal differencing
Q: Seasonal moving average order
m: Seasonality (12 for monthly data)

For Courses in Time Series Econometrics and Advanced Quantitative Methods


33

SARIMA Model Comparison


1 # Automatic SARIMA selection
2 auto_sarima <- [Link](energy_ts, seasonal=TRUE,
3 stepwise=TRUE, approximation=FALSE)
4
5 # Manual SARIMA selection
6 sarima1 <- Arima(energy_ts, order=c(1,1,1), seasonal=c(1,1,1))
7 sarima2 <- Arima(energy_ts, order=c(0,1,1), seasonal=c(0,1,1))
8 sarima3 <- Arima(energy_ts, order=c(1,1,1), seasonal=c(0,1,1))
9
10 # Compare SARIMA models
11 sarima_models <- list(auto_sarima, sarima1, sarima2, sarima3)
12 sarima_AICs <- sapply(sarima_models, function(x) x$aic)
13 sarima_BICs <- sapply(sarima_models, function(x) x$bic)

For Courses in Time Series Econometrics and Advanced Quantitative Methods


34

SARIMA Model Comparison


Model AIC BIC
SARIMA2, 0, 1, 2, 1, 2, 12[12] -356.65 -321.33
SARIMA(1,1,1)(1,1,1)[12] -332.55 -310.49
SARIMA(0,1,1)(0,1,1)[12] -310.53 -297.29
SARIMA(1,1,1)(0,1,1)[12] -332.09 -314.44

For Courses in Time Series Econometrics and Advanced Quantitative Methods


35

SARIMA Model Diagnostics


1 # Select best SARIMA model
2 best_sarima_index <- [Link](sarima_AICs)
3 best_sarima <- sarima_models[[best_sarima_index]]
4
5 # Check residuals
6 checkresiduals(best_sarima)

Ljung-Box test

data: Residuals from ARIMA(2,0,1)(2,1,2)[12]


Q* = 28.618, df = 17, p-value = 0.03822

Model df: 7. Total lags used: 24

For Courses in Time Series Econometrics and Advanced Quantitative Methods


36

SARIMA Fitted Values


1 # Plot fitted values against actual values
2 autoplot(energy_ts) +
3 autolayer(fitted(best_sarima), series="Fitted") +
4 ggtitle("Actual vs. Fitted Values - SARIMA") +
5 ylab("Quadrillion BTU") +
6 theme_minimal()

For Courses in Time Series Econometrics and Advanced Quantitative Methods


37

ARIMAX Models
ARIMAX (ARIMA with eXogenous variables) extends ARIMA by incorporating
external covariates
Useful for modeling exogenous shocks (known events that impact the
series) to energy consumption:
Economic recessions
COVID-19 pandemic
Policy changes
Weather events

For Courses in Time Series Econometrics and Advanced Quantitative Methods


38

ARIMAX Models
Let’s generate dummy variables to account for the economic impacts of the
2008 recession and the COVID-19 pandemic.
1 # Create dummy variables for recession periods
2 time_points <- time(energy_ts)
3 recession_2008 <- rep(0, length(energy_ts))
4 recession_2008[which(time_points >= 2008 & time_points < 2010)] <- 1
5
6 covid_2020 <- rep(0, length(energy_ts))
7 covid_2020[which(time_points >= 2020 & time_points < 2021.5)] <- 1
8
9 # Fit ARIMAX model with external regressors
10 arimax_model <- [Link](energy_ts,
11 seasonal=TRUE,
12 xreg=cbind(recession_2008, covid_2020))

For Courses in Time Series Econometrics and Advanced Quantitative Methods


39

ARIMAX Model Summary


1 # Summary table for ARIMAX
2 summary(arimax_model)

Series: energy_ts
Regression with ARIMA(2,0,1)(2,1,1)[12] errors

Coefficients:
ar1 ar2 ma1 sar1 sar2 sma1 recession_2008
1.3755 -0.3820 -0.8186 0.0398 -0.1809 -0.8026 -0.1380
s.e. 0.0661 0.0639 0.0442 0.0475 0.0452 0.0308 0.1003
covid_2020
-0.2707
s.e. 0.0986

sigma^2 = 0.03138: log likelihood = 187.23


AIC=-356.46 AICc=-356.16 BIC=-316.73

Training set error measures:

For Courses in Time Series Econometrics and Advanced Quantitative Methods


40

ARIMAX Coefficients Interpretation


Recession 2008:
Coefficient: -0.138
During the 2008 recession, monthly energy consumption decreased by
approximately 0.14 quadrillion BTU on average
COVID-19 Pandemic:
Coefficient: -0.2707
During the COVID-19 pandemic, monthly energy consumption
decreased by approximately 0.27 quadrillion BTU on average

For Courses in Time Series Econometrics and Advanced Quantitative Methods


41

ARIMA vs SARIMA vs ARIMAX: Performance


During Crisis Periods

For Courses in Time Series Econometrics and Advanced Quantitative Methods


42

ARIMA vs SARIMA vs ARIMAX: Performance


During Crisis Periods
Crisis Periods Comparison
2008 Recession COVID-19
Model [Link] [Link] [Link] [Link] [Link] [Link]
ARIMA 0.3736 0.4535 4.8424 0.3699 0.5098 5.1591
SARIMA 0.1291 0.1708 1.6646 0.2167 0.2943 2.9904
ARIMAX 0.1372 0.1779 1.7565 0.2051 0.2724 2.8466

For Courses in Time Series Econometrics and Advanced Quantitative Methods


43

ARIMA vs SARIMA vs ARIMAX: Performance


During Crisis Periods
ARIMAX shows modest improvement over SARIMA during crisis periods,
though the visual difference appears subtle
ARIMA models perform notably worse at capturing sudden economic
shocks
COVID-19 presented unique forecasting challenges with deeper, more
abrupt demand destruction
While seasonal patterns explain much variation, external regressors
capture structural breaks not accounted for in seasonal patterns

For Courses in Time Series Econometrics and Advanced Quantitative Methods


44

Forecasting Comparison
1 # Generate forecasts
2 arima_fc <- forecast(best_arima, h=24)
3 sarima_fc <- forecast(best_sarima, h=24)
4
5 # Generate ARIMAX forecast
6 future_recession <- rep(0, 24) # Assumption: no future recession
7 future_covid <- rep(0, 24) # Assumption: no future COVID impact
8
9 arimax_fc <- forecast(arimax_model, h=24,
10 xreg=cbind(future_recession, future_covid))

For Courses in Time Series Econometrics and Advanced Quantitative Methods


45

Model Validation: Out-of-Sample Testing


1 # Split data into training and test sets (using last 24 months as test)
2 train_end_year <- 2022
3 train_end_month <- 12
4 train <- window(energy_ts, end=c(train_end_year, train_end_month))
5 test <- window(energy_ts, start=c(train_end_year+1, 1))
6
7 # Fit models on training data
8 train_arima <- Arima(train, model=best_arima)
9 train_sarima <- Arima(train, model=best_sarima)
10
11 # Create training period recession and COVID indicators
12 train_recession <- recession_2008[1:length(train)]
13 train_covid <- covid_2020[1:length(train)]
14
15 # Test period indicators
16 test_recession <- rep(0, length(test)) # Assuming no recession in test period
17 test_covid <- rep(0, length(test)) # Assuming no COVID impact in test period
18

For Courses in Time Series Econometrics and Advanced Quantitative Methods


46

Forecast Evaluation

For Courses in Time Series Econometrics and Advanced Quantitative Methods


47

Forecast Accuracy Metrics


Model RMSE MAE MAPE AIC
ARIMA 0.4602 0.4077 5.28 649.67
SARIMA 0.1770 0.1235 1.56 -356.65
ARIMAX 0.1879 0.1247 1.58 -356.46

For Courses in Time Series Econometrics and Advanced Quantitative Methods


48

Structural Breaks in Energy Consumption


1 # Perform CUSUM test for structural breaks
2 cusum_test <- efp(energy_ts ~ time(energy_ts), type = "Rec-CUSUM")
3 plot(cusum_test, main = "CUSUM Test for Structural Breaks")

For Courses in Time Series Econometrics and Advanced Quantitative Methods


49

Detected Structural Breaks


1 # Bai-Perron test for multiple structural breaks
2 bp_test <- breakpoints(energy_ts ~ time(energy_ts))
3
4 # Plot the breaks
5 plot(energy_ts, main = "Energy Consumption with Detected Breakpoints")
6 lines(fitted(bp_test, breaks = bp_test$nbreaks), col = "red")
7 break_years <- time(energy_ts)[bp_test$breakpoints]
8 abline(v = break_years, col = "blue", lty = 2)

For Courses in Time Series Econometrics and Advanced Quantitative Methods


50

Key Findings & Implications


Best model for energy consumption: SARIMA with RMSE of 0.1770451
Seasonal patterns are significant and must be accounted for
Structural breaks correspond to major economic events such as oil crises
(1970s), economic recessions and COVID-19 pandemic
External shocks improve model fit and forecast accuracy
Implications for policy:
Energy forecasting needs to account for both seasonal patterns and
unexpected shocks
Models should be periodically updated to capture structural changes
For Courses in Time Series Econometrics and Advanced Quantitative Methods
51

Model Selection Guidelines


ARIMA is suitable when data shows no significant seasonal patterns
SARIMA is preferred when strong seasonal patterns exist (e.g., monthly
energy consumption)
ARIMAX is most appropriate when:
Known external factors influence the series
Historical shocks need to be modeled explicitly
Future scenarios involving potential shocks are being evaluated

For Courses in Time Series Econometrics and Advanced Quantitative Methods

You might also like