© 2023 FinTree Education Pvt. Ltd.
k
Eg. r=0.4 n=62 confidence level= 95% Perform a test of
significance Step 1: Define hypothesis H0: r = 0, Ha: r ≠ 0
rx n-
Step 0.4 x 62 - 3.3806
2: Calculate test statistic 2
2
2
1- 1 - 0.4
2
r
Step 3: Calculate critical values t-distribution, DoF =
60 -2 +2
Since calculate test statistic lies outside the range, conclusion is ‘Reject the null
hypothesis’ statistically significant, which means that population ‘r’ would ‘r’ is
be different than zero
Assumption of the test- The two populations (x and y) follow a normal
distribution (normally called bivariate normal distribution)
If this assumption is violated use spearman rank correlation coefficient
● Data used does NOT meet distribution requirements
Example : We want to test a hypothesis with respect to mean but sample size is
small and population is non - normal ( z or t-test won’t work!)
● Hypothesis is not with respect to parameter
Data is:given
● Example in RANKS
Test whether a sample is random or not or whether a population follows a
Example : Hypothesis test on ranks of fund manager
normal distribution or not
34
© 2023 FinTree Education Pvt.
Some important nonparametic tests
Parametri Nonparametric
c test
Tests concerning a single mean t- Wilcoxon signed -
test
z- ranked
test
Tests concerning differences t-test
Approximate t-test Mann-Whitney U test
between means st
Tests concerning mean
All queries/doubts about this reading can be Watch video with
posted on FinTree Forum for the reading important testable
concepts here
Forum Link
Video Link
3
© 2023 FinTree Education Pvt.
Introduction to Linear Regression
LOS a Dependent variable Independent variable
Aka response variable Aka the regressor
Variable you are seeking Variable you are using to
to explain explain changes in the
dependent variable
Also referred to as
explained Also referred to as
variable/endogenous explanatory
variable/predicted variable/exogenous
variable variable/predicting variable
y
Rp = RFR + β (Rm − RFR)
Depend
ent
Depende
nt Independent
variable variable
Independent
Intercept variable x
Slope
LOS b Describe the least squares criterion, how it is used to estimate
regression
coefficients, and their interpretation
Sum of squared errors
Sum of the squared vertical distances between the
(SSE):
estimated and actual Y-values
Line that minimizes the SSE
Regression
Describes change in ‘y’ for one unit change in ‘x’
line: Slope
Cov (x,y)
coefficient (beta): Variance
(x)
LOS c Assumptions underlying linear regression
O Relationship between dependent and independent variable is linear
8 Independent variable is uncorrelated with the error term
@ Expected value of the error term is zero
O Variance of the error term is constant (NOT ZERO).
The economic relationship b/w variables is intact for the entire time
period (eg. change in political regime)
0 Error term is uncorrelated with other observations (eg. seasonality)
0 Error term is normally distributed
3
© 2023 FinTree Education Pvt.
LOS d& e Analysis of variance (ANOVA)
Y: Mean ^
Yi: Actual Y :i Predicted value
value
Sum of squared errors (SSE) Regression sum of squares (RSS)
Total sum of squares (SST)
Measures unexplained variation
aka sum of squared residuals Measures total variation
∑ (Yi − Yi)2 Measures explained variation
∑ (Yi − Yi)2
^
∑ (Yii − Y 2)
^
+ Higher the RSS, better the quality of regression
+ R2 = RSS / SST
+ R2 = Expained Variation / Total Variation
ANOVA Table
Source of variation DoF Sum of squares Mean sum of squares
Regression
k RSS MSR = RSS/k
(explained)
Error
n−k− SSE MSE = SSE/n − k −
(unexplained)
1 1
Tota n−1 SST
l
F-statistic = MSR/MSE with ‘k’ and ‘n − k − 1'
DoF When to use F-test and t-test
F-test
Y = b0 + b1x1 + b2x2 + ε
t-test t-test
To test if the car is operating
as a whole
To test the engine of the
Use f-
car Use t-test test
To test individual tyres of the
car Use t-test
3
© 2023 FinTree Education Pvt.
Standard error of estimate, coefficient of
determination and confidence interval for
regression coefficient
Eg.
‘x’ 10 15 20 30
Actual ‘y’ 17 19 35 45
Su of squared errors
Predicted ‘y’ 15.8 23.36 30.9 46.01 m (SSE)
1 1
Errors 1.19 −4.36 4.09 −1.01
Squared 1.41 19 16.7 1.02 38.166
Standard error of estimate SSE 38.16
(SEE) = = 4.36
= 6
n−2 2
2
Coefficient of determination (R ): % variation of dependent variable explained by
% variation of the independent variable
2 2
For simple linear equation, R = r
LOS f Formulate a null and an alternative hypothesis about a
population value of a regression coefficient, and determine
whether the null hypothesis is
rejected at a given level of significance
^
Eg. b =
0.48 SE = 0.35 n = 42 Confidence interval = 90% Perform a test of significance
1
Step
1: Define hypothesis
^ ^
H : b = 0, H : b ≠ 0
0 1 a
Step Sample stat. − HV 0.48 −
Calculate test 1.371
2: Std. error 0
0.35
statistic
Step Calculate critical values t-distribution, DoF =
3: 40 −1.684 1.684
Since calculated test statistic lies inside the range, conclusion is ‘Failed to reject the null
hypothesis’ Slope is not significantly different from zero
One Tailed Test
^
Eg. b =
0.48 SE = 0.35 n = 42 Confidence interval = 90% Perform a test of significance
1
Step
1: Define hypothesis
^
H : b < 0, H : b >
^
Step 0 Sample stat. 0.48 −
Calculate test 0 1 a 1.371
2: 0
− HV
0.35
statistic Std. error
Step 3: Calculate critical values t-
3
© 2023 FinTree Education Pvt.
distribution
, DoF = 40
0 1.684
Since calculated test statistic lies inside the range, conclusion is ‘Failed to reject the
null hypothesis’ Slope is not significantly different from zero
3
© 2023 FinTree Education Pvt. Ltd.
LOS Calculate and interpret the predicted value for the dependent variable,
g and a prediction interval for it
Predicted value Confidence interval for the predicted
of dependent value of dependent variable
variable
^ +
Y = ^ ^ ^
b0 b × Y ± (tc × SE)
X
1
Intercept
Forecaste Predicte Standar
d value d value d
(x) (y) error
Predicte
d value Slop Critical
(y) e value (t-
value)
Eg. Forecasted return (x) =
12% Intercept = −4% Slope = Standard error = 2.68
0.75
n = 32 Calculate predicted value (y) and 95% confidence interval
Predicted value Confidence interval
^
^
Y = b0^ + b1^ × Xp Y ± (tc × SE)
Y = −4 + 0.75 × 5 ± (2.042 × 2.68)
12 = 5%
−0.472 to 10.472
LOS h Describe different functional forms of simple linear regressions
Log-lin
Lin- Log-
Model
Log Log
Model
Model Model
Dependent variable
is logarithmic but Dependent variable is Similar to probit and
the independent linear but the logit but uses financial
variable is linear independent variable is ratios as independent
logarithmic variables
Selecting the Correct Functional Form
The key to fitting the appropriate functional form of a simple
linear regression is examining the goodness of fit
measures:
• The coefficient of determination (R2),
• The F-statistic,
• The standard error of the estimate (se)
• As well as examining whether there are patterns in the residual
All queries/doubts about this reading can be
posted on FinTree Forum for the Watch video with important
reading testable concepts here
Forum Link Video Link 39