1
Time Series Forecasting
Models
ARIMA, SARIMA, ARIMAX
Quang Nguyen
CELG, UEH
2025-03-01
For Courses in Time Series Econometrics and Advanced Quantitative Methods
2
The Time Series Forecasting Challenge
Goal: Predict future values based on past observations
Challenge: Real-world time series contain multiple patterns:
Long-term trends
Seasonal patterns
Cyclical fluctuations
Irregular movements
External influences
For Courses in Time Series Econometrics and Advanced Quantitative Methods
3
The Fundamental Question
If we observe a pattern like this:
How do we predict what comes next?
For Courses in Time Series Econometrics and Advanced Quantitative Methods
4
ARIMA: The Building Blocks Approach
ARIMA models work by combining three key components:
1. AR - AutoRegressive: Today depends on yesterday
2. I - Integrated: Accounting for trends
3. MA - Moving Average: Learning from past surprises
Let’s understand each component intuitively…
For Courses in Time Series Econometrics and Advanced Quantitative Methods
5
The AR Component: Learning from the Past
Autoregression: Current values depend on previous values
Intuition: Like a ball rolling with momentum
For Courses in Time Series Econometrics and Advanced Quantitative Methods
6
The I Component: Handling Trends
Integration/Differencing: Accounts for upward or downward trends
Intuition: Looking at changes rather than absolute values
For Courses in Time Series Econometrics and Advanced Quantitative Methods
7
The MA Component: Learning from Surprises
Moving Average: Current values depend on recent unexpected changes
Intuition: Adjusting predictions based on recent mistakes
For Courses in Time Series Econometrics and Advanced Quantitative Methods
8
Combining the Components: ARIMA
ARIMA(p,d,q) combines all three components: p: How many past values
influence today (AR); d: How many times we difference the data (I); q: How
many past forecast errors matter (MA)
Real-World Analogy: Weather Forecasting
AR Component: “Today was hot, so tomorrow will likely be hot too”
I Component: “Temperatures have been rising all week, so tomorrow
will be even hotter”
MA Component: “Yesterday’s forecast was 2° too high, so let’s adjust
today’s prediction downward”
ARIMA combines these intuitions into a mathematical model.
For Courses in Time Series Econometrics and Advanced Quantitative Methods
9
What About Seasonality?
Many time series show regular patterns that repeat:
How do we account for these repeating patterns?
For Courses in Time Series Econometrics and Advanced Quantitative Methods
10
SARIMA: Adding Seasonality
SARIMA: Seasonal ARIMA ARIMA có tính mùa => SARIMA
Notation: ARIMA(p,d,q)(P,D,Q)[m]
(p,d,q): Regular ARIMA components
(P,D,Q): Seasonal equivalents
m: Length of seasonal cycle (e.g., 12 for monthly data)
For Courses in Time Series Econometrics and Advanced Quantitative Methods
11
SARIMA: Two Layers of Patterns
For Courses in Time Series Econometrics and Advanced Quantitative Methods
12
SARIMA: Real-World Example
Monthly retail sales:
Regular pattern: This month’s sales relate to last month’s (AR)
Seasonal pattern: December is always high (seasonal AR)
Regular adjustment: Correcting for recent forecast errors (MA)
Seasonal adjustment: Learning from last year’s holiday season errors
(seasonal MA)
For Courses in Time Series Econometrics and Advanced Quantitative Methods
13
What About External Factors?
Time series often respond to outside influences:
Energy consumption during economic recession suy thoái kinh tế
Product sales during a marketing campaign
Traffic patterns during construction
How do we incorporate these external factors?
For Courses in Time Series Econometrics and Advanced Quantitative Methods
14
ARIMAX: Adding External Variables
ARIMAX: ARIMA with eXogenous variables ARIMA có yếu tố khác => ARIMAX
Intuition: Combining time patterns with external influences
Examples of external factors:
Economic indicators (GDP, unemployment)
Policy changes (tax rates, regulations)
Crises (pandemics, natural disasters)
Planned events (holidays, promotions)
For Courses in Time Series Econometrics and Advanced Quantitative Methods
15
ARIMAX: Visual Intuition
For Courses in Time Series Econometrics and Advanced Quantitative Methods
16
When to Use Each Model?
ARIMA: When data shows:
No clear seasonality and persistent patterns/trends
Example: Daily stock prices
SARIMA: When data shows:
Regular seasonal patterns
Example: Monthly energy consumption
ARIMAX: When data is affected by:
Known external events or measurable external factors
Example: Energy consumption during recessions/pandemics
For Courses in Time Series Econometrics and Advanced Quantitative Methods
17
Comparing Model Capabilities
Feature ARIMA SARIMA ARIMAX
Accounts for trends Yes Yes Yes
Captures persistence Yes Yes Yes
Models forecast errors Yes Yes Yes
Handles seasonality No Yes Possible
Incorporates external factors No No Yes
For Courses in Time Series Econometrics and Advanced Quantitative Methods
18
Real-World Analogies
ARIMA: Like forecasting your daily commute time based only on how long
it took the past few days
SARIMA: Like forecasting your commute time knowing that Monday
có tính mùa
mornings and Friday evenings always have different patterns
ARIMAX: Like forecasting your commute time considering both past
Chèn văn bản vào đây
patterns AND knowing about a planned road construction
For Courses in Time Series Econometrics and Advanced Quantitative Methods
19
Illustration: Energy Consumption Example
Let’s consider energy consumption forecasting:
ARIMA: “Energy use tomorrow depends on today’s use plus recent trends”
SARIMA: “January is always high due to heating, July is high due to
cooling - these seasonal patterns repeat annually”
ARIMAX: “Energy consumption follows its usual patterns BUT drops
significantly during economic recessions and the COVID-19 pandemic”
Model selection should be guided by (1) the characteristics of your data;
(2) the forecasting goal; (3) the availability of external information
For Courses in Time Series Econometrics and Advanced Quantitative Methods
20
Illustration: Energy Consumption Example
Real-world application using U.S. energy consumption data (1973-2024)
Step-by-step modeling process:
Data exploration and properties assessment
Model specification and diagnostics
Forecasting and validation
Handling structural breaks and exogenous shocks
For Courses in Time Series Econometrics and Advanced Quantitative Methods
21
Primary Energy Consumption Data
1 # Load energy consumption data
2 energy_data <- [Link]("enercons_quadbtu.csv")
3
4 # Create a time series object starting from January 1973 with monthly frequency
5 energy_ts <- ts(energy_data$[Link],
6 start=c(1973, 1), frequency=12)
7
8 # Time series plot
9 autoplot(energy_ts) +
10 ggtitle("Monthly U.S. Total Primary Energy Consumption (1973-2024)") +
11 ylab("Quadrillion BTU") +
12 theme_minimal()
For Courses in Time Series Econometrics and Advanced Quantitative Methods
22
Seasonal Patterns in Energy Consumption
1 # Seasonal plot
2 ggseasonplot(energy_ts, [Link]=TRUE, [Link]=TRUE) +
3 ggtitle("Seasonal Plot: U.S. Energy Consumption") +
4 ylab("Quadrillion BTU")
For Courses in Time Series Econometrics and Advanced Quantitative Methods
23
Time Series Decomposition
1 # Time series decomposition
2 energy_decomp <- stl(energy_ts, [Link]="periodic")
3 autoplot(energy_decomp) +
4 ggtitle("Decomposition of U.S. Energy Consumption")
For Courses in Time Series Econometrics and Advanced Quantitative Methods
24
Stationarity Assessment
Before fitting ARIMA models, we need to check for stationarity
Non-stationary time series need to be transformed (differencing)
1 # KPSS test for stationarity
2 [Link](energy_ts)
KPSS Test for Level Stationarity
data: energy_ts
KPSS Level = 7.001, Truncation lag parameter = 6, p-value = 0.01
1 # Reject H0: Series is non-stationary
For Courses in Time Series Econometrics and Advanced Quantitative Methods
25
Differencing for Stationarity
1 par(mfrow=c(2,1), mar=c(4,4,2,1))
2
3 # Plot original series and first difference
4 plot(energy_ts, main="Original Series", ylab="Quad BTU")
5
6 # Regular differencing (first difference)
7 energy_diff1 <- diff(energy_ts, differences=1)
8 plot(energy_diff1, main="First Difference", ylab="Change in Quad BTU")
1 par(mfrow=c(1,1))
For Courses in Time Series Econometrics and Advanced Quantitative Methods
26
Seasonal Differencing
1 # Seasonal differencing and both regular+seasonal
2 par(mfrow=c(2,1), mar=c(4,4,2,1))
3
4 # Seasonal differencing
5 energy_season_diff <- diff(energy_ts, lag=12)
6 plot(energy_season_diff, main="Seasonal Difference (lag=12)",
7 ylab="Seasonal Change")
8
9 # Both regular and seasonal differencing
10 energy_both_diff <- diff(diff(energy_ts, lag=12), differences=1)
11 plot(energy_both_diff, main="Regular + Seasonal Differencing",
12 ylab="Change")
1 par(mfrow=c(1,1))
For Courses in Time Series Econometrics and Advanced Quantitative Methods
27
ACF and PACF Analysis
1 par(mfrow=c(2,1))
2 Acf(energy_both_diff, main="ACF of Regular+Seasonal Differenced Series", [Link]=36)
3 Pacf(energy_both_diff, main="PACF of Regular+Seasonal Differenced Series", [Link]=36)
1 par(mfrow=c(1,1))
For Courses in Time Series Econometrics and Advanced Quantitative Methods
28
ARIMA Models
ARIMA(p,d,q) (AutoRegressive Integrated Moving Average) models
combine
AR(p) autoregressive terms,
I(d) differencing, and
MA(q) moving average terms.
Estimation approach:
Automatic selection using information criteria
Manual specification based on ACF/PACF patterns
For Courses in Time Series Econometrics and Advanced Quantitative Methods
29
ARIMA Model Comparison
1 # Automatic ARIMA selection
2 auto_arima <- [Link](energy_ts, seasonal=FALSE,
3 stepwise=TRUE, approximation=FALSE)
4
5 # Manual ARIMA selection (example models)
6 arima1 <- Arima(energy_ts, order=c(1,1,1))
7 arima2 <- Arima(energy_ts, order=c(2,1,2))
8 arima3 <- Arima(energy_ts, order=c(1,1,2))
9
10 # Compare models using AIC/BIC
11 models <- list(auto_arima, arima1, arima2, arima3)
12 AICs <- sapply(models, function(x) x$aic)
13 BICs <- sapply(models, function(x) x$bic)
14 [Link](Model=c(paste0("ARIMA(", toString(arimaorder(auto_arima)),")"),
15 "ARIMA(1,1,1)", "ARIMA(2,1,2)", "ARIMA(1,1,2)"),
16 AIC=round(AICs, 2), BIC=round(BICs, 2)) %>% kable()
For Courses in Time Series Econometrics and Advanced Quantitative Methods
30
ARIMA Model Comparison
Model AIC BIC
ARIMA(1, 1, 5) 649.67 680.70
ARIMA(1,1,1) 797.53 810.83
ARIMA(2,1,2) 784.73 806.89
ARIMA(1,1,2) 782.21 799.94
For Courses in Time Series Econometrics and Advanced Quantitative Methods
31
ARIMA Model Diagnostics
1 # Select best ARIMA model based on AIC
2 best_arima_index <- [Link](AICs)
3 best_arima <- models[[best_arima_index]]
4
5 # Check residuals of the best ARIMA model
6 checkresiduals(best_arima)
Ljung-Box test
data: Residuals from ARIMA(1,1,5)
Q* = 688.63, df = 18, p-value < 2.2e-16
Model df: 6. Total lags used: 24
For Courses in Time Series Econometrics and Advanced Quantitative Methods
32
SARIMA Models
SARIMA(p,d,q)(P,D,Q)m (Seasonal ARIMA) extends ARIMA by adding
seasonal components:
(p,d,q): Non-seasonal components
(P,D,Q): Seasonal components
P: Seasonal autoregressive order
D: Seasonal differencing
Q: Seasonal moving average order
m: Seasonality (12 for monthly data)
For Courses in Time Series Econometrics and Advanced Quantitative Methods
33
SARIMA Model Comparison
1 # Automatic SARIMA selection
2 auto_sarima <- [Link](energy_ts, seasonal=TRUE,
3 stepwise=TRUE, approximation=FALSE)
4
5 # Manual SARIMA selection
6 sarima1 <- Arima(energy_ts, order=c(1,1,1), seasonal=c(1,1,1))
7 sarima2 <- Arima(energy_ts, order=c(0,1,1), seasonal=c(0,1,1))
8 sarima3 <- Arima(energy_ts, order=c(1,1,1), seasonal=c(0,1,1))
9
10 # Compare SARIMA models
11 sarima_models <- list(auto_sarima, sarima1, sarima2, sarima3)
12 sarima_AICs <- sapply(sarima_models, function(x) x$aic)
13 sarima_BICs <- sapply(sarima_models, function(x) x$bic)
For Courses in Time Series Econometrics and Advanced Quantitative Methods
34
SARIMA Model Comparison
Model AIC BIC
SARIMA2, 0, 1, 2, 1, 2, 12[12] -356.65 -321.33
SARIMA(1,1,1)(1,1,1)[12] -332.55 -310.49
SARIMA(0,1,1)(0,1,1)[12] -310.53 -297.29
SARIMA(1,1,1)(0,1,1)[12] -332.09 -314.44
For Courses in Time Series Econometrics and Advanced Quantitative Methods
35
SARIMA Model Diagnostics
1 # Select best SARIMA model
2 best_sarima_index <- [Link](sarima_AICs)
3 best_sarima <- sarima_models[[best_sarima_index]]
4
5 # Check residuals
6 checkresiduals(best_sarima)
Ljung-Box test
data: Residuals from ARIMA(2,0,1)(2,1,2)[12]
Q* = 28.618, df = 17, p-value = 0.03822
Model df: 7. Total lags used: 24
For Courses in Time Series Econometrics and Advanced Quantitative Methods
36
SARIMA Fitted Values
1 # Plot fitted values against actual values
2 autoplot(energy_ts) +
3 autolayer(fitted(best_sarima), series="Fitted") +
4 ggtitle("Actual vs. Fitted Values - SARIMA") +
5 ylab("Quadrillion BTU") +
6 theme_minimal()
For Courses in Time Series Econometrics and Advanced Quantitative Methods
37
ARIMAX Models
ARIMAX (ARIMA with eXogenous variables) extends ARIMA by incorporating
external covariates
Useful for modeling exogenous shocks (known events that impact the
series) to energy consumption:
Economic recessions
COVID-19 pandemic
Policy changes
Weather events
For Courses in Time Series Econometrics and Advanced Quantitative Methods
38
ARIMAX Models
Let’s generate dummy variables to account for the economic impacts of the
2008 recession and the COVID-19 pandemic.
1 # Create dummy variables for recession periods
2 time_points <- time(energy_ts)
3 recession_2008 <- rep(0, length(energy_ts))
4 recession_2008[which(time_points >= 2008 & time_points < 2010)] <- 1
5
6 covid_2020 <- rep(0, length(energy_ts))
7 covid_2020[which(time_points >= 2020 & time_points < 2021.5)] <- 1
8
9 # Fit ARIMAX model with external regressors
10 arimax_model <- [Link](energy_ts,
11 seasonal=TRUE,
12 xreg=cbind(recession_2008, covid_2020))
For Courses in Time Series Econometrics and Advanced Quantitative Methods
39
ARIMAX Model Summary
1 # Summary table for ARIMAX
2 summary(arimax_model)
Series: energy_ts
Regression with ARIMA(2,0,1)(2,1,1)[12] errors
Coefficients:
ar1 ar2 ma1 sar1 sar2 sma1 recession_2008
1.3755 -0.3820 -0.8186 0.0398 -0.1809 -0.8026 -0.1380
s.e. 0.0661 0.0639 0.0442 0.0475 0.0452 0.0308 0.1003
covid_2020
-0.2707
s.e. 0.0986
sigma^2 = 0.03138: log likelihood = 187.23
AIC=-356.46 AICc=-356.16 BIC=-316.73
Training set error measures:
For Courses in Time Series Econometrics and Advanced Quantitative Methods
40
ARIMAX Coefficients Interpretation
Recession 2008:
Coefficient: -0.138
During the 2008 recession, monthly energy consumption decreased by
approximately 0.14 quadrillion BTU on average
COVID-19 Pandemic:
Coefficient: -0.2707
During the COVID-19 pandemic, monthly energy consumption
decreased by approximately 0.27 quadrillion BTU on average
For Courses in Time Series Econometrics and Advanced Quantitative Methods
41
ARIMA vs SARIMA vs ARIMAX: Performance
During Crisis Periods
For Courses in Time Series Econometrics and Advanced Quantitative Methods
42
ARIMA vs SARIMA vs ARIMAX: Performance
During Crisis Periods
Crisis Periods Comparison
2008 Recession COVID-19
Model [Link] [Link] [Link] [Link] [Link] [Link]
ARIMA 0.3736 0.4535 4.8424 0.3699 0.5098 5.1591
SARIMA 0.1291 0.1708 1.6646 0.2167 0.2943 2.9904
ARIMAX 0.1372 0.1779 1.7565 0.2051 0.2724 2.8466
For Courses in Time Series Econometrics and Advanced Quantitative Methods
43
ARIMA vs SARIMA vs ARIMAX: Performance
During Crisis Periods
ARIMAX shows modest improvement over SARIMA during crisis periods,
though the visual difference appears subtle
ARIMA models perform notably worse at capturing sudden economic
shocks
COVID-19 presented unique forecasting challenges with deeper, more
abrupt demand destruction
While seasonal patterns explain much variation, external regressors
capture structural breaks not accounted for in seasonal patterns
For Courses in Time Series Econometrics and Advanced Quantitative Methods
44
Forecasting Comparison
1 # Generate forecasts
2 arima_fc <- forecast(best_arima, h=24)
3 sarima_fc <- forecast(best_sarima, h=24)
4
5 # Generate ARIMAX forecast
6 future_recession <- rep(0, 24) # Assumption: no future recession
7 future_covid <- rep(0, 24) # Assumption: no future COVID impact
8
9 arimax_fc <- forecast(arimax_model, h=24,
10 xreg=cbind(future_recession, future_covid))
For Courses in Time Series Econometrics and Advanced Quantitative Methods
45
Model Validation: Out-of-Sample Testing
1 # Split data into training and test sets (using last 24 months as test)
2 train_end_year <- 2022
3 train_end_month <- 12
4 train <- window(energy_ts, end=c(train_end_year, train_end_month))
5 test <- window(energy_ts, start=c(train_end_year+1, 1))
6
7 # Fit models on training data
8 train_arima <- Arima(train, model=best_arima)
9 train_sarima <- Arima(train, model=best_sarima)
10
11 # Create training period recession and COVID indicators
12 train_recession <- recession_2008[1:length(train)]
13 train_covid <- covid_2020[1:length(train)]
14
15 # Test period indicators
16 test_recession <- rep(0, length(test)) # Assuming no recession in test period
17 test_covid <- rep(0, length(test)) # Assuming no COVID impact in test period
18
For Courses in Time Series Econometrics and Advanced Quantitative Methods
46
Forecast Evaluation
For Courses in Time Series Econometrics and Advanced Quantitative Methods
47
Forecast Accuracy Metrics
Model RMSE MAE MAPE AIC
ARIMA 0.4602 0.4077 5.28 649.67
SARIMA 0.1770 0.1235 1.56 -356.65
ARIMAX 0.1879 0.1247 1.58 -356.46
For Courses in Time Series Econometrics and Advanced Quantitative Methods
48
Structural Breaks in Energy Consumption
1 # Perform CUSUM test for structural breaks
2 cusum_test <- efp(energy_ts ~ time(energy_ts), type = "Rec-CUSUM")
3 plot(cusum_test, main = "CUSUM Test for Structural Breaks")
For Courses in Time Series Econometrics and Advanced Quantitative Methods
49
Detected Structural Breaks
1 # Bai-Perron test for multiple structural breaks
2 bp_test <- breakpoints(energy_ts ~ time(energy_ts))
3
4 # Plot the breaks
5 plot(energy_ts, main = "Energy Consumption with Detected Breakpoints")
6 lines(fitted(bp_test, breaks = bp_test$nbreaks), col = "red")
7 break_years <- time(energy_ts)[bp_test$breakpoints]
8 abline(v = break_years, col = "blue", lty = 2)
For Courses in Time Series Econometrics and Advanced Quantitative Methods
50
Key Findings & Implications
Best model for energy consumption: SARIMA with RMSE of 0.1770451
Seasonal patterns are significant and must be accounted for
Structural breaks correspond to major economic events such as oil crises
(1970s), economic recessions and COVID-19 pandemic
External shocks improve model fit and forecast accuracy
Implications for policy:
Energy forecasting needs to account for both seasonal patterns and
unexpected shocks
Models should be periodically updated to capture structural changes
For Courses in Time Series Econometrics and Advanced Quantitative Methods
51
Model Selection Guidelines
ARIMA is suitable when data shows no significant seasonal patterns
SARIMA is preferred when strong seasonal patterns exist (e.g., monthly
energy consumption)
ARIMAX is most appropriate when:
Known external factors influence the series
Historical shocks need to be modeled explicitly
Future scenarios involving potential shocks are being evaluated
For Courses in Time Series Econometrics and Advanced Quantitative Methods