Last Revised: 07/25/2023
2024 Level 2 - Quantitative Methods
Learning Modules Page
Basics of Multiple Regression and Underlying Assumptions 2
Evaluating Regression Model Fit and Interpreting Model Results 6
Model Misspecification 11
Extensions of Multiple Regression 16
Time-Series Analysis 23
Machine Learning 32
Big Data Projects 43
Review 50
M.M134813896.
This document should be used in conjunction with the corresponding learning modules in the 2024 Level 2 CFA® Program
curriculum. Some of the graphs, charts, tables, examples, and figures are copyright 2023, CFA Institute. Reproduced and
republished with permission from CFA Institute. All rights reserved.
Required disclaimer: CFA Institute does not endorse, promote, or warrant accuracy or quality of the products or services
offered by [Link]. CFA Institute, CFA®, and Chartered Financial Analyst® are trademarks owned by CFA
Institute.
© 2533695 Ontario Limited d/b/a [Link]. All rights reserved.
1
Last Revised: 07/25/2023
Basics of Multiple Regression and Underlying Assumptions
a. describe the types of investment problems addressed by multiple linear regression
and the regression process
b. formulate a multiple linear regression model, describe the relation between the
dependent variable and several independent variables, and interpret estimated
regression coefficients
c. explain the assumptions underlying a multiple linear regression model and
interpret residual plots indicating potential violations of these assumptions
M.M134813896.
2
Last Revised: 07/25/2023
Basics of Multiple Regression
Page 1/
specify the model
Main tasks
interpret the output
Multiple regression used to:
identify relationships between variables
test existing theories
forecast value of a DV
- model:
𝐘𝐢 = 𝐛𝟎 + 𝐛𝟏 𝐗 𝟏𝐢 + 𝐛𝟐 𝐗 𝟐𝐢 + ⋯ + 𝐛𝐊 𝐗 𝐊 𝐢 + 𝛆𝐢 𝐢 = 1 ➞ n
deterministic part n > 𝐤
intercept
𝐤 IVs or slope Stochastic
coefficients part
- partial slope coefficients
%𝐛 ➞ estimated
* Describe the types of investment problems addressed by multiple linear regression and the regression process
Page 2/
partial slope coefficient: measures ∆DV for a 1 unit ∆IV holding
all other IVs constant
e.g./ RET = .0023 - 5.0585 BY - 2.1901 CS
return bond yield credit spread
when both IVs = 0, RET = .0023
BY ↑ 1, RET ↓ 5.0585
CS ↓ 1, RET ↑ 2.1901
Assumptions/ M.M134813896.
1/ Linearity - relationship between the DV and IVs are linear
2/ Homoskedasticity - the variance of the regression residuals is
the same for all observations i.e. 𝐕𝐚𝐫(𝛆𝐢 ) = 𝐕𝐚𝐫.𝛆𝐣 /
* Formulate a multiple linear regression model, describe the relation between the dependent variable and several independent variables, and interpret estimated regression coefficients
3
Last Revised: 07/25/2023
Page 3/
Assumptions/
3/ Independence of errors - the observations are independent of
one another
∴ regression residuals are uncorrelated across observations
4/ Normality - regression residuals are normally distributed
5/ Independence of IVs
1/ IVs are not random (i.e. - they have a specific value)
2/ no exact linear relationship between 2 or more IVs
Scatterplot matrix (pairs plot)
- uses simple linear regression: DV vs. each IV
+ each IV vs. the other IVs
what to see
don’t want to see
linear relationships
linear relationships
* Explain the assumptions underlying a multiple linear regression model and interpret residual plots indicating potential violations of these assumptions
Page 4/
Scatterplot Matrix
since we can, and will, interpret
slight pos. relationship output ➞ this is not a very useful
step
- any violations will be identified
statistically, not visually
DV
IVs
pos. linear neg. linear
M.M134813896.
almost no ‘apparent’
➞ 𝐛𝐒𝐌𝐁 is sig. in the output however
linear relationship
* Explain the assumptions underlying a multiple linear regression model and interpret residual plots indicating potential violations of these assumptions
4
Last Revised: 07/25/2023
Page 5/
- helps identify outliers
- helps visually assess
homoskedasticity
dispersion appears fairly constant
and random
non-constant correlated errors
variance (autocorrelation)
* Explain the assumptions underlying a multiple linear regression model and interpret residual plots indicating potential violations of these assumptions
Page 6/
standardized residuals vs. normal distribution
- outliers
affect outlier
parameter value of 𝛆
values 5%
𝛆𝐢 − 𝛆1
directional 𝛔𝛆
relationship
= misspecified
model
5% Q-Q plot
corr(IV, 𝛆)
outliers
M.M134813896.
-1.65 +1.65
Z score
- normally distributed 𝛆 should
fall on the vertical line
* Explain the assumptions underlying a multiple linear regression model and interpret residual plots indicating potential violations of these assumptions
5
Last Revised: 07/25/2023
Evaluating Regression Model Fit and Interpreting Model Results
a. evaluate how well a multiple regression model explains the dependent variable by
analyzing ANOVA table results and measures of goodness of fit
b. formulate hypotheses on the significance of two or more coefficients in a multiple
regression model and interpret the results of the joint hypothesis tests
c. calculate and interpret a predicted value for the dependent variable, given the
estimated regression model and assumed values for the independent variable
M.M134813896.
6
Last Revised: 07/25/2023
Evaluating Model Fit/Interpreting Results
Page 1/
Coefficient of Determination ➞ 𝐑𝟐 explained predicted
𝐬𝐮𝐦 𝐨𝐟 𝐬𝐪𝐮𝐚𝐫𝐞𝐬 𝐫𝐞𝐠𝐫𝐞𝐬𝐬𝐢𝐨𝐧 ∑.𝐘 @/𝟐
%𝐢 − 𝐘
= =
𝐬𝐮𝐦 𝐨𝐟 𝐬𝐪𝐮𝐚𝐫𝐞𝐬 𝐭𝐨𝐭𝐚𝐥 @)𝟐
∑(𝐘𝐢 − 𝐘
average
observed
as IVs are added, 𝐑𝟐 will increase or stay the same
- never decreases
no information on coefficient significance
no information on regression violations that may cause
coefficients to be biased
poor gauge of model fit - overfitting creates a bad model
- low in-sample error, high
out-of-sample error
$ 𝟐 = 1 - 𝐒𝐒𝐄/𝐧 − 𝐤 − 𝟏
Adjusted 𝐑𝟐 ➞ 𝐑 𝐧−𝟏
= 𝟏 − IJ K (𝟏 − 𝐑𝟐 )L
𝐒𝐒𝐓/𝐧 − 𝟏 𝐧−𝐤−𝟏
* Evaluate how well a multiple regression model explains the dependent variable by analyzing ANOVA table results and measures of goodness of fit
Page 2/
Adjusted 𝐑 ➞ 𝐑 𝟐 $𝟐 = 1 - 𝐒𝐒𝐄/𝐧 − 𝐤 − 𝟏 𝐧−𝟏
= 𝟏 − IJ K (𝟏 − 𝐑𝟐 )L
𝐒𝐒𝐓/𝐧 − 𝟏 𝐧−𝐤−𝟏
FYI 𝐒𝐒𝐄 𝐧−𝟏
× 𝐒𝐒𝐄(𝐧 − 𝟏) 𝐧−𝟏 𝐒𝐒𝐄
𝐧−𝐤−𝟏 𝐒𝐒𝐓
= =- .- .
𝐒𝐒𝐓 𝐧−𝟏 𝐒𝐒𝐓(𝐧 − 𝐤 − 𝟏) 𝐧 − 𝐤 − 𝟏 𝐒𝐒𝐓
×
𝐧−𝟏 𝐒𝐒𝐓
➞ SSR + SSE = SST ➞ 𝐒𝐒𝐑D𝐒𝐒𝐓 + 𝐒𝐒𝐄D𝐒𝐒𝐓 = 1
𝐑𝟐 + 𝐒𝐒𝐄D𝐒𝐒𝐓 = 1
M.M134813896.
∴ 𝐒𝐒𝐄D𝐒𝐒𝐓 = 1 - 𝐑𝟐
" 𝟐 : if 𝐤 ≥ 𝟏 , 𝐑
- for 𝐑 " 𝟐 < 𝐑𝟐
" 𝟐 ↑ , else 𝐑
& if coefficient’s |𝐭 − 𝐬𝐭𝐚𝐭| > 𝟏 , 𝐑 "𝟐 ↓
* Evaluate how well a multiple regression model explains the dependent variable by analyzing ANOVA table results and measures of goodness of fit
7
Last Revised: 07/25/2023
Page 3/
application/
𝐒𝐒𝐑 𝟗𝟎. 𝟔𝟐𝟑𝟒
𝐑𝟐 = = = . 𝟔𝟏𝟓𝟓
𝐒𝐒𝐓 𝟏𝟒𝟕. 𝟐𝟒𝟏𝟔
𝟓𝟎 − 𝟏
@ 𝟐 = 𝟏 − IJ
𝐑 K (𝟏 − . 𝟔𝟏𝟓𝟓)L
𝟓𝟎 − 𝟓 − 𝟏
= . 𝟓𝟕𝟏𝟖
$ 𝟐 ↑ with Factor 1, 3, 4
𝐑
$ 𝟐 ↓ with Factor 2 & 5
𝐑
also
insignificant
F1 + F2 : 𝐑
@ 𝟐 ↓ , add F3 : 𝐑
@ 𝟐 ↑ , Add F4 : 𝐑
@ 𝟐 ↑ , Add F5 : 𝐑
@𝟐 ↓
* Evaluate how well a multiple regression model explains the dependent variable by analyzing ANOVA table results and measures of goodness of fit
Page 4/
$𝟐
𝐑 ➞ no intuitive explanation, re: %’age of variance explained
➞ no information on coefficient significance or potential
coefficient bias
➞ not a ‘goodness of fit’ measure
AIC - Akaike’s Information Criterion/ evaluates a collection of models
that explain the same DV
lower = better
adding IVs may lower
AIC = n In .𝐒𝐒𝐄D𝐧/ + 2(𝐤 + 1)
SSE, but never raise it.
lower = better penalty term
BIC - Schwartz’s Bayesian Information Criterion/
BIC = n In .𝐒𝐒𝐄D𝐧/ + InM.M134813896.
(n)(𝐤 + 1)
since In(n) > 2, BIC assesses a greater penalty
* Evaluate how well a multiple regression model explains the dependent variable by analyzing ANOVA table results and measures of goodness of fit